This study develops a deep learning with natural language processing. It focuses on overcoming the limitations of current speech recognition tools, which struggle with legal terminology and identifying different courtroom speakers. By combining advanced audio processing, role identification, and error correction techniques, including a Bert-based model and an N-gram model, the research aims to automate the transcription process more efficiently. This method not only promises to enhance the accuracy of capturing court proceedings but also aims to revolutionize the transcription practices by reducing manual effort and increasing the reliability of legal documents.