This thesis presents comprehensive research into the dynamic balance control of a humanoid robot, namely the Robinion2S. The research initiates with detail of humanoid robot platforms with mechatronic systems, walking gait algorithms, and perception systems. Special focus is given to Robinion2S, the latest version of the Robinion series, which forms the backbone of the study. The experiments show the limitations of traditional PID-based balance control methods when deployed in a complex, dynamic environment such as a balance board. Despite optimization efforts using a high-throughput random search algorithm within Nvidia’s Isaac Gym simulation environment, the PID-based approach fails to ensure consistent balance. This result leads to the need for more robust control strategies. The research focuses on reinforcement learning techniques to balance control to overcome the result. Despite the challenges of traditional control theory, reinforcement learning techniques show potential as a viable solution to the intricacies of balance control. The reinforcement learning models demonstrate their adaptability and robustness in maintaining balance, hinting at their potential to solve more complex control problems. Extending the study into real-world applications, the Sim2Real approach is developed. The Sim2Real approach implements the trained reinforcement learning models into a dynamic, physical environment. Despite not achieving ideal results, the approach demonstrates the potential for trained models to transfer control policies effectively from simulation to the real environment. This thesis provides potential methods in the field of balance control in humanoid robots, motivating a shift from traditional control methods to more robust reinforcement learning techniques. Despite not being ideal, the obtained results show significant potential for future research and advancements in the robotics field. This research provides a foundation for understanding the balance control in the humanoid robot and potential strategies to optimize the performance in a real-world environment.