Acute kidney injury (AKI) represents a critical clinical challenge in intensive care units, affecting approximately 20–50% of critically ill patients with mortality rates exceeding 50% in severe cases. The sequential nature of AKI management—involving continuous decisions about diuretics, fluid balance, and renal replacement therapy (RRT)—presents a natural application for reinforcement learning (RL). However, the inherent risks of deploying learned policies in healthcare necessitate robust offline learning approaches that can leverage historical electronic health records without direct patient interaction. This paper introduces World Model Enhanced Offline Reinforcement Learning (WME-ORL), a novel framework that integrates an ensemble Fourier Neural Operator-Transformer (FNO-Transformer) world model with stage-aware Implicit Q-Learning (IQL) for optimizing sequential interventions in AKI patients. Our approach addresses three fundamental challenges in medical offline RL: distribution shift through uncertainty-penalized value estimation, clinical safety through explicit rule integration, and patient heterogeneity through AKI stage-aware policy adaptation. Extensive experiments on 46,337 ICU patients from MIMIC-IV demonstrate that WME-ORL achieves superior policy value (Interquartile-Normalized Return (IQNR), 0.82 vs. 0.72 for standard IQL), reduces predicted RRT initiation rates by 31.9%, and maintains less than 5% clinical rule violations. Ablation studies reveal that the FNO-Transformer architecture provides the most reliable uncertainty estimates among compared architectures, a critical property for safe clinical deployment.



