World Model Enhanced Offline Reinforcement Learning for Sequential Intervention Optimization in Acute Kidney Injury

Bailing Zhang; Yuwei Mi

doi:10.53941/aim.2026.100002

Abstract

Acute kidney injury (AKI) represents a critical clinical challenge in intensive care units, affecting approximately 20–50% of critically ill patients with mortality rates exceeding 50% in severe cases. The sequential nature of AKI management—involving continuous decisions about diuretics, fluid balance, and renal replacement therapy (RRT)—presents a natural application for reinforcement learning (RL). However, the inherent risks of deploying learned policies in healthcare necessitate robust offline learning approaches that can leverage historical electronic health records without direct patient interaction. This paper introduces World Model Enhanced Offline Reinforcement Learning (WME-ORL), a novel framework that integrates an ensemble Fourier Neural Operator-Transformer (FNO-Transformer) world model with stage-aware Implicit Q-Learning (IQL) for optimizing sequential interventions in AKI patients. Our approach addresses three fundamental challenges in medical offline RL: distribution shift through uncertainty-penalized value estimation, clinical safety through explicit rule integration, and patient heterogeneity through AKI stage-aware policy adaptation. Extensive experiments on 46,337 ICU patients from MIMIC-IV demonstrate that WME-ORL achieves superior policy value (Interquartile-Normalized Return (IQNR), 0.82 vs. 0.72 for standard IQL), reduces predicted RRT initiation rates by 31.9%, and maintains less than 5% clinical rule violations. Ablation studies reveal that the FNO-Transformer architecture provides the most reliable uncertainty estimates among compared architectures, a critical property for safe clinical deployment.

References

1.
Hoste, E.A.; Bagshaw, S.M.; Bellomo, R.; et al. Epidemiology of Acute Kidney Injury in Critically Ill Patients: The Multinational AKI-EPI Study. Intensive Care Med. 2015, 41, 1411–1423.
2.
Chawla, L.S.; Eggers, P.W.; Star, R.A.; et al. Acute Kidney Injury and Chronic Kidney Disease as Interconnected Syndromes. N. Engl. J. Med. 2014, 371, 58–66.
3.
Kellum, J.A.; Romagnani, P.; Ashuntantang, G.; et al. Acute Kidney Injury. Nat. Rev. Dis. Primers 2021, 7, 52.
4.
Khwaja, A. KDIGO Clinical Practice Guidelines for Acute Kidney Injury. Nephron Clin. Pract. 2012, 120, c179–c184.
5.
Zarbock, A.; Kellum, J.A.; Schmidt, C.; et al. Effect of Early vs Delayed Initiation of Renal Replacement Therapy on Mortality in Critically Ill Patients with Acute Kidney Injury: The ELAIN Randomized Clinical Trial. JAMA 2016, 315, 2190–2199.
6.
Gaudry, S.; Hajage, D.; Schortgen, F.; et al. Initiation Strategies for Renal-Replacement Therapy in the Intensive Care Unit. N. Engl. J. Med. 2016, 375, 122–133.
7.
Yu, C.; Liu, J.; Nemati, S.; et al. Reinforcement Learning in Healthcare: A Survey. ACM Comput. Surv. 2021, 55, 1–36.
8.
Komorowski, M.; Celi, L.A.; Badawi, O.; et al. The Artificial Intelligence Clinician Learns Optimal Treatment Strategies for Sepsis in Intensive Care. Nat. Med. 2018, 24, 1716–1720.
9.
Prasad, N.; Cheng, L.F.; Chiber, C.; et al. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units. arXiv 2017, arXiv:1704.06300.
10.
Raghu, A.; Komorowski, M.; Ahmed, I.; et al. Continuous State-Space Models for Optimal Sepsis Treatment: A Deep Reinforcement Learning Approach. arXiv 2017, arXiv:1705.08422.
11.
Gottesman, O.; Johansson, F.; Komorowski, M.; et al. Guidelines for Reinforcement Learning in Healthcare. Nat. Med. 2019, 25, 16–18.
12.
Levine, S.; Kumar, A.; Tucker, G.; et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv 2020, arXiv:2005.01643.
13.
Kumar, A.; Zhou, A.; Tucker, G.; et al. Conservative Q-Learning for Offline Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1179–1191.
14.
Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022), Online, 25–29 April 2022.
15.
Ha, D.; Schmidhuber, J. World Models. arXiv 2018, arXiv:1803.10122.
16.
Hafner, D.; Lillicrap, T.; Ba, J.; et al. Dream to Control: Learning Behaviors by Latent Imagination. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 30 April 2020.
17.
Hafner, D.; Lillicrap, T.; Norouzi, M.; et al. Mastering Atari with DiscreteWorld Models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021.
18.
Li, Z.; Kovachki, N.; Azizzadenesheli, K.; et al. Fourier Neural Operator for Parametric Partial Differential Equations. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021.
19.
Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30.
20.
Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30.
21.
Fujimoto, S.; Meger, D.; Precup, D. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2052–2062.
22.
Wu, Y.; Tucker, G.; Nachum, O. Behavior Regularized Offline Reinforcement Learning. arXiv 2019, arXiv:1911.11361.
23.
Chen, L.; Lu, K.; Rajeswaran, A.; et al. Decision Transformer: Reinforcement Learning via Sequence Modeling. Adv. Neural Inf. Process. Syst. 2019, 34, 15084–15097.
24.
Janner, M.; Li, Q.; Levine, S. Offline Reinforcement Learning as One Big Sequence Modeling Problem. Adv. Neural Inf. Process. Syst. 2021, 34, 1273–1286.
25.
Sutton, R.S. Dyna, an Integrated Architecture for Learning, Planning, and Reacting. ACM SIGART Bull. 1991, 2, 160–163.
26.
Janner, M.; Fu, J.; Zhang, M.; et al. When to Trust Your Model: Model-Based Policy Optimization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 12520–12531.
27.
Kovachki, N.; Li, Z.; Liu, B.; et al. Neural Operator: Learning Maps Between Function Spaces With Applications to PDEs. J. Mach. Learn. Res. 2023, 24, 1–97.
28.
Peng, X.; Ding, Y.; Wihl, D.; et al. Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning. AMIA Annu. Symp. Proc. 2018, 2018, 887.
29.
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us