2511002323
  • Open Access
  • Article

Stable CDE Autoencoders with Acuity Regularization for Offline Reinforcement Learning in Sepsis Treatment

  • Yue Gao

Received: 13 Oct 2025 | Revised: 13 Nov 2025 | Accepted: 18 Nov 2025 | Published: 12 Dec 2025

Abstract

Effective reinforcement learning (RL) for sepsis treatment depends on learning stable, clinically meaningful state representations from irregular ICU time series. While previous works have explored representation learning for this task, the critical challenge of training instability in sequential representations and its detrimental impact on policy performance has been overlooked. This work demonstrates that Controlled Differential Equations (CDE) state representation can achieve strong RL policies when two key factors are met: (1) ensuring training stability through early stopping or stabilization methods, and (2) enforcing acuity-aware representations by correlation regularization with clinical scores (SOFA, SAPS-II, OASIS). Experiments on the MIMIC-III sepsis cohort reveal that stable CDE autoencoder produces representations strongly correlated with acuity scores and enables RL policies with superior performance (WIS return > 0.9). In contrast, unstable CDE representation leads to degraded representations and policy failure (WIS return ∼ 0). Visualizations of the latent space show that stable CDEs not only separate survivor and non-survivor trajectories but also reveal clear acuity score gradients, whereas unstable training fails to capture either pattern. These findings highlight practical guidelines for using CDEs to encode irregular medical time series in clinical RL, emphasizing the need for training stability in sequential representation learning.

References 

  • 1.

    Shashikumar, S.P.; Josef, C.S.; Sharma, A.; et al. DeepAISE—An interpretable and recurrent neural survival model for early
    prediction of sepsis. Artif. Intell. Med. 2021, 113, 102036.

  • 2.

    Solıs-Garcıa, J.; Vega-Marquez, B.; Nepomuceno, J.A.; et al. Comparing artificial intelligence strategies for early sepsis
    detection in the ICU: An experimental study. Appl. Intell. 2023, 53, 30691–30705.

  • 3.

    Komorowski, M.; Celi, L.; Badawi, O.; et al. The Artificial Intelligence Clinician learns optimal treatment strategies for
    sepsis in intensive care. Nat. Med. 2018, 24, 1716–1720.

  • 4.

    Killian, T.W.; Zhang, H.; Subramanian, J.; et al. An Empirical Study of Representation Learning for Reinforcement Learning
    in Healthcare. In Proceedings of the Machine Learning for Health NeurIPS Workshop, Online, 11 December 2020; Volume
    136, pp. 139–160.

  • 5.

    Jayaraman, P.; Desman, J.; Sabounchi, M.; et al. A Primer on Reinforcement Learning in Medicine for Clinicians. NPJ
    Digit. Med. 2024, 7, 337.

  • 6.

    Bock, M.; Malle, J.; Pasterk, D.; et al. Superhuman performance on sepsis MIMIC-III data by distributional reinforcement
    learning. PLOS ONE 2022, 17, e0275358.

  • 7.

    Johnson, A.; Pollard, T.; Mark, R. MIMIC-III Clinical Database (Version 1.4). Available online: https://physionet.org/
    content/mimiciii/1.4/ (accessed on 4 September 2016).

  • 8.

    Johnson, A.E.W.; Pollard, T.J.; Shen, L.; et al. MIMIC-III, a freely accessible critical care database. Sci. Data 2016,
    3, 160035.

  • 9.

    Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New
    Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215–e220.

  • 10.

    Morrill, J.; Kidger, P.; Yang, L.; et al. Neural Controlled Differential Equations for Online Prediction Tasks. arXiv 2021,
    arXiv:2106.11028.

  • 11.

    Agarwala, S.; Dees, B.; Lowman, C. Geometric instability of out of distribution data across autoencoder architecture. arXiv
    2022, arXiv:2201.11902.

  • 12.

    Shoosmith, J.N. Numerical Analysis. In Encyclopedia of Physical Science and Technology, 3rd ed.; Meyers, R.A., Ed.;
    Academic Press: New York, NY, USA, 2003; pp. 39–70.

  • 13.

    Baker, J.; Xia, H.; Wang, Y.; et al. Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs. arXiv 2022,
    arXiv:2204.08621.

  • 14.

    Gao, Y. Stable CDE Autoencoders with Acuity Regularization for Offline Reinforcement Learning in Sepsis Treatment.
    arXiv 2025, arXiv:2506.15019.

  • 15.

    Rubanova, Y.; Chen, R.T.Q.; Duvenaud, D.K. Latent Ordinary Differential Equations for Irregularly-Sampled Time
    Series. In Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., Eds.; Curran
    Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32.

  • 16.

    Che, Z.; Purushotham, S.; Cho, K.; et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci.
    Rep. 2018, 8, 6085.

  • 17.

    Yoon, J.; Jordon, J.; van der Schaar, M. GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings
    of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018.

  • 18.

    Zhang, K.; Xue, Y.; Flores, G.; et al. Modelling EHR Timeseries by Restricting Feature Interaction. arXiv 2019, arXiv:1911.06410.

  • 19.

    Kidger, P.; Foster, J.; Li, X.; et al. Neural SDEs as Infinite-Dimensional GANs. In Proceedings of the 38th International
    Conference on Machine Learning, Online, 18–24 July 2021; Volume 139, pp. 5453–5463.

  • 20.

    Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; et al. Neural Ordinary Differential Equations. In Advances in Neural
    Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Eds.; Curran Associates, Inc.: Red Hook, NY,
    USA, 2018; Volume 31.

  • 21.

    Timothee, L.; Natalia, D.R.; Jean-Franois, G.; et al. State representation learning for control: An overview. Neural Netw.
    2018, 108, 379–392.

  • 22.

    Hairer, E.; Nørsett, S.; Wanner, G. Solving Ordinary Differential Equations I Nonstiff Problems, 2nd ed.; Springer: Berlin,
    Germany, 2000.

  • 23.

    Kim, S.; Ji, W.; Deng, S.; et al. Stiff neural ordinary differential equations. Chaos Interdiscip. J. Nonlinear Sci. 2021, 31.
    https://doi.org/10.1063/5.0060697.

  • 24.

    Gebregiorgis, S.; Gonfa, G. The comparison of runge-kutta and adams-bashforh-moulton methods for the first order ordinary
    differential equations. Int. J. Curr. Res. 2021, 8, 27356–27360.

  • 25.

    Fronk, C.; Petzold, L. Training Stiff Neural Ordinary Differential Equations with Implicit Single-Step Methods. arXiv 2024,
    arXiv:2410.05592.

  • 26.

    Nguyen, H.H.N.; Nguyen, T.; Vo, H.; et al. Improving Neural Ordinary Differential Equations with Nesterov’s Accelerated
    Gradient Method. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Eds.;
    Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 7712–7726.

  • 27.

    Pal, A.; Ma, Y.; Shah, V.; et al. Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal
    Solver Heuristics. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp.
    8325–8335.

  • 28.

    Vincent, J.L.; Moreno, R.; Takala, J.; et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ
    dysfunction/failure: On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care
    Medicine. Intensive Care Med. 1996, 22, 707–710.

  • 29.

    Gall, J.R.L.; Lemeshow, S.; Saulnier, F. A new Simplified Acute Physiology Score (SAPS II) based on a European/North
    American multicenter study. JAMA 1993, 270, 2957–2963.

  • 30.

    Johnson, A.E.W.; Kramer, A.A.; Clifford, G.D. A New Severity of Illness Scale Using a Subset of Acute Physiology and Chronic Health Evaluation Data Elements Shows Comparable Predictive Accuracy. Crit. Care Med. 2013, 41, 1711–1718.

  • 31.

    Mahmood, A.R.; van Hasselt, H.; Sutton, R.S. Weighted Importance Sampling for Off-Policy Learning with Linear Function
    Approximation. In Advances in Neural Information Processing Systems; Ghahramani, Z., Welling, M., Cortes, C., Eds.;
    Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27.

  • 32.

    Huang, Y.; Cao, R.; Rahmani, A. Reinforcement Learning For Sepsis Treatment: A Continuous Action Space Solution. In
    Proceedings of the 7th Machine Learning for Healthcare Conference, Durham, NC, USA, 5–6 August 2022; Volume 182,
    pp. 631–647.

  • 33.

    Alain, G.; Bengio, Y. What Regularized Auto-Encoders Learn from the Data Generating Distribution. J. Mach. Learn. Res.
    2014, 15, 3563–3593.

  • 34.

    Le, L.; Patterson, A.; White, M. Supervised Autoencoders: Improving Generalization Performance with Unsupervised
    Regularizers. In Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Eds.; Curran
    Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31.

  • 35.

    Giannini, H.M.; Ginestra, J.C.; Chivers, C.; et al. A Machine Learning Algorithm to Predict Severe Sepsis and Septic Shock:
    Development, Implementation, and Impact on Clinical Practice. Crit. Care Med. 2019, 47, 1485–1492.

  • 36.

    Choo, Y.J.; Chang, M.C. Use of Machine Learning in Stroke Rehabilitation: A Narrative Review. Brain NeuroRehabil.
    2022, 15, e26.

  • 37.

    Hassan, A.M.; Rajesh, A.; Asaad, M.; et al. Artificial Intelligence and Machine Learning in Prediction of Surgical
    Complications: Current State, Applications, and Implications. Am. Surg. 2023, 89, 25–30.

  • 38.

    Brouwer, E.D.; Krishnan, R.G. Anamnesic Neural Differential Equations with Orthogonal Polynomial Projections. arXiv
    2023, arXiv:2303.01841.

  • 39.

    Coelho, C.; Costa, M.F.P.; Ferr´as, L. Enhancing continuous time series modelling with a latent ODE-LSTM approach. Appl.
    Math. Comput. 2024, 475, 128727.

Share this article:
How to Cite
Gao, Y. Stable CDE Autoencoders with Acuity Regularization for Offline Reinforcement Learning in Sepsis Treatment. Transactions on Artificial Intelligence 2025, 1 (1), 307–325. https://doi.org/10.53941/tai.2025.100021.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.