Accurate organ segmentation is crucial for precise medical diagnosis. Recent methods in CNNs and Transformers have significantly enhanced automatic medical image segmentation. Their encoders and decoders often rely on simple skip connections, which fail to effectively integrate multi-scale features. This causes a misalignment between low-resolution global features and high-resolution spatial information. As a result, segmentation accuracy suffers, particularly in global contours and local details. To address this limitation, MILENet, a multi-scale interaction and locally enhanced bridging network, is proposed. The proposed context bridge incorporates a multi-scale interaction module to reorganize multi-scale features and ensure global correlation. Additionally, a local enhancement module is introduced. It includes a dilated coordinate attention mechanism and a locally enhanced FFN built with a cascaded convolutional structure. This module enhances local context modeling and improves feature discrimination. Furthermore, a source-driven connection mechanism is introduced to preserve detailed information across layers, providing richer features for decoder reconstruction. By leveraging these innovations, MILENet effectively aligns multi-scale features and enhances local details, thereby improving segmentation accuracy. MILENet has been evaluated on publicly available datasets spanning abdominal CT (Synapse), cardiac MRI (ACDC), and colonoscopy RGB images (Kvasir, CVC-ClinicDB, CVC-ColonDB, CVC-300, and ETIS-LaribDB). The results show that MILENet achieves state-of-the-art performance across different modalities. It effectively handles both large-organ segmentation in CT/MRI and fine-grained polyp delineation in endoscopic images, demonstrating strong generalizability to diverse anatomical structures and imaging conditions. The code has been released on GitHub: https://github.com/syzhou1226/MILENET.