UNet has achieved remarkable results and made significant contributions in the semantic segmentation of medical images. Recently, the rapid development of large language models has brought new milestones to the field of artificial intelligence, inspiring us to apply their successful experiences to neural networks in computer vision. This study incorporates the self-attention mechanism into the UNet architecture, enabling each pixel to better understand global information, thereby enhancing the relationships between features. We conducted experiments on medical image datasets, and the results indicate that the enhanced model significantly improves segmentation accuracy and robustness. Our research showcases the potential of the self-attention mechanism in enhancing the performance of medical image segmentation.