Reconfigurable Intelligent Surfaces (RIS) have emerged as a key enabling technology for beyond-5G and 6G wireless networks, offering programmable control over the radio propagation environment with extremely low power consumption. However, jointly optimizing the base station (BS) beamforming vector and the high-dimensional RIS phase configuration remains a fundamentally challenging task due to non-convex coupling, hardware constraints, imperfect channel knowledge, and fast-varying user mobility patterns. Traditional optimization-based approaches, such as alternating optimization and convex relaxations, struggle to scale with large RIS arrays and are unable to adapt efficiently to rapidly changing channel conditions. To address these limitations, this work proposes a deep reinforcement learning (DRL) framework that learns an adaptive control policy through direct interaction with the wireless environment, without requiring explicit channel models or handcrafted optimization procedures. The proposed actor–critic architecture simultaneously outputs continuous beamforming and RIS phase-shift actions and incorporates domain-specific reward shaping to balance spectral efficiency, energy consumption, and phase-switching smoothness. Comprehensive experiments across diverse propagation scenarios—including shadowing variations, multipath sparsity levels, mobile users, and hardware ablation settings—demonstrate that the proposed method achieves significantly higher rate, energy efficiency, and robustness than conventional baselines, while maintaining efficient online inference suitable for real-time 6G deployments. The results confirm that DRL-driven beamforming provides a scalable and model-agnostic solution for next-generation intelligent wireless environments.



