2604003779
  • Open Access
  • Article

Diffusion-Pref: Diffusion World Model Guided Zero-Shot Preference Learning for Safe Glucose Control in Type 1 Diabetes

  • Bailing Zhang 1,*,   
  • Yuwei Mi 2

Received: 17 Dec 2025 | Revised: 21 Mar 2026 | Accepted: 28 Apr 2026 | Published: 09 May 2026

Abstract

Automated blood glucose control for Type 1 diabetes (T1D) is a safety-critical clinical challenge in which the consequences of suboptimal control—particularly hypoglycemia—can be life-threatening. Existing reinforcement learning (RL) approaches are limited by the prohibition on online exploration, the difficulty of specifying clinically meaningful reward functions, and insufficient guarantees of worst-case safety. We propose Diffusion-Pref, a purely offline RL framework that integrates three synergistic components: (i) a conditional diffusion world model that captures the multimodal distributional uncertainty of glucose dynamics, (ii) a zero-shot preference construction method that automatically generates trajectory preference labels from established clinical metrics—eliminating the need for human annotation, and (iii) a Conditional Value-at-Risk (CVaR)-regularized Implicit Q-Learning (IQL) algorithm that explicitly optimizes for worst-case safety. We evaluate Diffusion-Pref on the OhioT1DM dataset comprising 12 real-world T1D patients. The proposed method achieves a Time-in-Range (TIR) of 89.7%, substantially exceeding both the historical treatment record (75.0%) and a Conservative Q-Learning (CQL) baseline (78.1%). Severe hypoglycemia (glucose <54 mg/dL) is reduced from 6.9% to 1.8%, a 74% relative reduction. The overall Time Below Range (TBR) of 10.3% exceeds the recommended <4% target; however, supplementary experiments with an explicit TBR penalty (λTBR = 2.0) demonstrate that TBR can be reduced to 5.1% while preserving a TIR of 83.4%. Ablation studies confirm that each component—diffusion world model, zero-shot preference learning, and CVaR constraints—contributes meaningfully to performance. These results demonstrate that combining generative world models with clinically grounded preference learning and risk-sensitive policy optimization offers a promising pathway toward safer offline glucose control, although prospective clinical validation remains necessary before deployment.

Graphical Abstract

References 

  • 1.

    Atkinson, M.A.; Eisenbarth, G.S.; Michels, A.W. Type 1 diabetes. Lancet 2014, 383, 69–82.

  • 2.

    Battelino, T.; Danne, T.; Bergenstal, R.M.; et al. Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations from the International Consensus on Time in Range. Diabetes Care 2019, 42, 1593–1603.

  • 3.

    Fox, I.; Lee, J.; Pop-Busui, R.; et al. Deep reinforcement learning for closed-loop blood glucose control. In Proceedings of the Machine Learning for Healthcare Conference 2020, Virtual, 7–8 August 2020; pp. 508–536.

  • 4.

    Zhu, T.; Li, K.; Herrero, P.; et al. Basal glucose control in type 1 diabetes using deep reinforcement learning: An in silico validation. IEEE J. Biomed. Health Inform. 2021, 25, 1223–1232.

  • 5.

    Hafner, D.; Lillicrap, T.; Fischer, I.; et al. Dream to Control: Learning Behaviors by Latent Imagination. arXiv 2019, arXiv:1912.01603.

  • 6.

    Wu, P.; Allard, A.; Majumdar, A.; et al. DayDreamer: World Models for Physical Robot Learning. arXiv 2022, arXiv:2206.14176.

  • 7.

    Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems 2020, Virtual, 6–12 December 2020; pp. 6840–6851.

  • 8.

    Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; et al. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv 2020, arXiv:2011.13456.

  • 9.

    Janner, M.; Du, Y.; Tenenbaum, J.B.; et al. Planning with Diffusion for Flexible Behavior Synthesis. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 9902–9915.

  • 10.

    Chi, C.; Feng, S.; Du, Y.; et al. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Int. J. Robot. Res. 2025, 44, 1684–1704.

  • 11.

    Christiano, P.F.; Leike, J.; Brown, T.; et al. Deep reinforcement learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 4299–4307.

  • 12.

    Ouyang, L.; Wu, J.; Jiang, X.; et al. Training language models to follow instructions with human feedback. In Proceedings of the Advances in Neural Information Processing Systems 2022, New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 27730–27744.

  • 13.

    Adjevi, A.; Abdirashid, A.M.; Aktas¸, F.; et al. Explainable reinforcement learning for glucose monitoring based on Shapley value analysis. Comput. Methods Programs Biomed. 2026, 278, 109266. https://doi.org/10.1016/j.cmpb.2026.109266.

  • 14.

    Shan, Y.; Yu, J. Non-invasive blood glucose monitoring via multimodal features fusion with interpretable machine learning. Appl. Sci. 2026, 16, 790.

  • 15.

    Emerson, H.; Guy, M.; McConville, R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J. Biomed. Inform. 2023, 142, 104376.

  • 16.

    Yamagata, T.; O’Kane, A.A.; Ayobi, A.; et al. Model-Based Reinforcement Learning for Type 1 Diabetes Blood Glucose Control. In Proceedings of the 1st International AAI4H—Advances in Artificial Intelligence for Healthcare Workshop, Santiago de Compostela, Spain, 4 September 2020; pp. 72–77.

  • 17.

    Daskalaki, E.; Prountzou, A.; Diem, P.; et al. Real-time adaptive models for the personalized prediction of glycemic profile in type 1 diabetes patients. Diabetes Technol. Ther. 2012, 14, 168–174.

  • 18.

    Bastani, O. Model-Free Intelligent Diabetes Management Using Machine Learning. Master’s thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2014.

  • 19.

    Zhu, T.; Li, K.; Georgiou, P. Dual-Hormone Closed-Loop Delivery System for Type 1 Diabetes Using Deep Reinforcement Learning. arXiv 2019, arXiv:1910.04059.

  • 20.

    Kumar, A.; Zhou, A.; Tucker, G.; et al. Conservative Q-Learning for Offline Reinforcement Learning. In Proceedings of the Advances in Neural Information Processing Systems 2020, Virtual, 6–12 December 2020; Volume 33, pp. 1179–1191.

  • 21.

    Ha, D.; Schmidhuber, J. World Models. arXiv 2018, arXiv:1803.10122.

  • 22.

    Hafner, D.; Lillicrap, T.; Norouzi, M.; et al. Mastering Atari with Discrete World Models. arXiv 2020, arXiv:2010.02193.

  • 23.

    Wang, Z.; Hunt, J.J.; Zhou, M. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. arXiv 2022, arXiv:2208.06193.

  • 24.

    Kazerouni, A.; Aghdam, E.K.; Heidari, M.; et al. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 88, 102846.

  • 25.

    Lee, K.; Smith, L.; Abbeel, P. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6152–6163.

  • 26.

    Kim, C.; Park, J.; Shin, J.; et al. Preference Transformer: Modeling Human Preferences using Transformers for RL. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023.

  • 27.

    Pacchiano, A.; Saha, A.; Lee, J. Dueling RL: Reinforcement Learning with Trajectory Preferences. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 6263–6289.

  • 28.

    Bai, Y.; Jones, A.; Ndousse, K.; et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv 2022, arXiv:2204.05862.

  • 29.

    Park, J.; Seo, Y.; Shin, J.; et al. SURF: Semi-supervised Reward Learning with Data Augmentation for Preference-based Reinforcement Learning. In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022.

  • 30.

    Altman, E. Constrained Markov Decision Processes; Routledge: Oxfordshire, UK, 2021.

  • 31.

    Morimoto, J.; Doya, K. Robust reinforcement learning. Neural Comput. 2005, 17, 335–359.

  • 32.

    Tamar, A.; Glassner, Y.; Mannor, S. Optimizing the CVaR via Sampling. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015.

  • 33.

    Rockafellar, R.T.; Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2000, 2, 21–42.

  • 34.

    Chow, Y.; Tamar, A.; Mannor, S.; et al. Risk-sensitive and robust decision-making: A CVaR optimization approach. In Proceedings of the Advances in Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Volume 28.

  • 35.

    Tang, Y.C.; Zhang, J.; Salakhutdinov, R. Worst Cases Policy Gradients. arXiv 2019, arXiv:1911.03618.

  • 36.

    Kovatchev, B.P.; Otto, E.; Cox, D.; et al. Evaluation of a New Measure of Blood Glucose Variability in Diabetes. Diabetes Care 2006, 29, 2433–2438.

  • 37.

    Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the International Conference on Learning Representations 2022, Virtual, 25–29 April 2022.

  • 38.

    Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations 2021, Virtual, 3–7 May 2021.

  • 39.

    Marling, C.; Bunescu, R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. In Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data Co-Located with 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain, 29–30 August 2020; Volume 2675, pp. 71–74.

Share this article:
How to Cite
Zhang, B.; Mi, Y. Diffusion-Pref: Diffusion World Model Guided Zero-Shot Preference Learning for Safe Glucose Control in Type 1 Diabetes. LifeAI 2026, 1 (1), 4.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.