Diffusion-Pref: Diffusion World Model Guided Zero-Shot Preference Learning for Safe Glucose Control in Type 1 Diabetes

Bailing Zhang; Yuwei Mi

Abstract

Automated blood glucose control for Type 1 diabetes (T1D) is a safety-critical clinical challenge in which the consequences of suboptimal control—particularly hypoglycemia—can be life-threatening. Existing reinforcement learning (RL) approaches are limited by the prohibition on online exploration, the difficulty of specifying clinically meaningful reward functions, and insufficient guarantees of worst-case safety. We propose Diffusion-Pref, a purely offline RL framework that integrates three synergistic components: (i) a conditional diffusion world model that captures the multimodal distributional uncertainty of glucose dynamics, (ii) a zero-shot preference construction method that automatically generates trajectory preference labels from established clinical metrics—eliminating the need for human annotation, and (iii) a Conditional Value-at-Risk (CVaR)-regularized Implicit Q-Learning (IQL) algorithm that explicitly optimizes for worst-case safety. We evaluate Diffusion-Pref on the OhioT1DM dataset comprising 12 real-world T1D patients. The proposed method achieves a Time-in-Range (TIR) of 89.7%, substantially exceeding both the historical treatment record (75.0%) and a Conservative Q-Learning (CQL) baseline (78.1%). Severe hypoglycemia (glucose <54 mg/dL) is reduced from 6.9% to 1.8%, a 74% relative reduction. The overall Time Below Range (TBR) of 10.3% exceeds the recommended <4% target; however, supplementary experiments with an explicit TBR penalty (λ_TBR = 2.0) demonstrate that TBR can be reduced to 5.1% while preserving a TIR of 83.4%. Ablation studies confirm that each component—diffusion world model, zero-shot preference learning, and CVaR constraints—contributes meaningfully to performance. These results demonstrate that combining generative world models with clinically grounded preference learning and risk-sensitive policy optimization offers a promising pathway toward safer offline glucose control, although prospective clinical validation remains necessary before deployment.

References

1.
Atkinson, M.A.; Eisenbarth, G.S.; Michels, A.W. Type 1 diabetes. Lancet 2014, 383, 69–82.
2.
Battelino, T.; Danne, T.; Bergenstal, R.M.; et al. Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations from the International Consensus on Time in Range. Diabetes Care 2019, 42, 1593–1603.
3.
Fox, I.; Lee, J.; Pop-Busui, R.; et al. Deep reinforcement learning for closed-loop blood glucose control. In Proceedings of the Machine Learning for Healthcare Conference 2020, Virtual, 7–8 August 2020; pp. 508–536.
4.
Zhu, T.; Li, K.; Herrero, P.; et al. Basal glucose control in type 1 diabetes using deep reinforcement learning: An in silico validation. IEEE J. Biomed. Health Inform. 2021, 25, 1223–1232.
5.
Hafner, D.; Lillicrap, T.; Fischer, I.; et al. Dream to Control: Learning Behaviors by Latent Imagination. arXiv 2019, arXiv:1912.01603.
6.
Wu, P.; Allard, A.; Majumdar, A.; et al. DayDreamer: World Models for Physical Robot Learning. arXiv 2022, arXiv:2206.14176.
7.
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems 2020, Virtual, 6–12 December 2020; pp. 6840–6851.
8.
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; et al. Score-Based Generative Modeling through Stochastic Differential Equations. arXiv 2020, arXiv:2011.13456.
9.
Janner, M.; Du, Y.; Tenenbaum, J.B.; et al. Planning with Diffusion for Flexible Behavior Synthesis. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 9902–9915.
10.
Chi, C.; Feng, S.; Du, Y.; et al. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Int. J. Robot. Res. 2025, 44, 1684–1704.
11.
Christiano, P.F.; Leike, J.; Brown, T.; et al. Deep reinforcement learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 4299–4307.
12.
Ouyang, L.; Wu, J.; Jiang, X.; et al. Training language models to follow instructions with human feedback. In Proceedings of the Advances in Neural Information Processing Systems 2022, New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 27730–27744.
13.
Adjevi, A.; Abdirashid, A.M.; Aktas¸, F.; et al. Explainable reinforcement learning for glucose monitoring based on Shapley value analysis. Comput. Methods Programs Biomed. 2026, 278, 109266. https://doi.org/10.1016/j.cmpb.2026.109266.
14.
Shan, Y.; Yu, J. Non-invasive blood glucose monitoring via multimodal features fusion with interpretable machine learning. Appl. Sci. 2026, 16, 790.
15.
Emerson, H.; Guy, M.; McConville, R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J. Biomed. Inform. 2023, 142, 104376.
16.
Yamagata, T.; O’Kane, A.A.; Ayobi, A.; et al. Model-Based Reinforcement Learning for Type 1 Diabetes Blood Glucose Control. In Proceedings of the 1st International AAI4H—Advances in Artificial Intelligence for Healthcare Workshop, Santiago de Compostela, Spain, 4 September 2020; pp. 72–77.
17.
Daskalaki, E.; Prountzou, A.; Diem, P.; et al. Real-time adaptive models for the personalized prediction of glycemic profile in type 1 diabetes patients. Diabetes Technol. Ther. 2012, 14, 168–174.
18.
Bastani, O. Model-Free Intelligent Diabetes Management Using Machine Learning. Master’s thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2014.
19.
Zhu, T.; Li, K.; Georgiou, P. Dual-Hormone Closed-Loop Delivery System for Type 1 Diabetes Using Deep Reinforcement Learning. arXiv 2019, arXiv:1910.04059.
20.
Kumar, A.; Zhou, A.; Tucker, G.; et al. Conservative Q-Learning for Offline Reinforcement Learning. In Proceedings of the Advances in Neural Information Processing Systems 2020, Virtual, 6–12 December 2020; Volume 33, pp. 1179–1191.
21.
Ha, D.; Schmidhuber, J. World Models. arXiv 2018, arXiv:1803.10122.
22.
Hafner, D.; Lillicrap, T.; Norouzi, M.; et al. Mastering Atari with Discrete World Models. arXiv 2020, arXiv:2010.02193.
23.
Wang, Z.; Hunt, J.J.; Zhou, M. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. arXiv 2022, arXiv:2208.06193.
24.
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; et al. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 88, 102846.
25.
Lee, K.; Smith, L.; Abbeel, P. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6152–6163.
26.
Kim, C.; Park, J.; Shin, J.; et al. Preference Transformer: Modeling Human Preferences using Transformers for RL. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023.
27.
Pacchiano, A.; Saha, A.; Lee, J. Dueling RL: Reinforcement Learning with Trajectory Preferences. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 6263–6289.
28.
Bai, Y.; Jones, A.; Ndousse, K.; et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv 2022, arXiv:2204.05862.
29.
Park, J.; Seo, Y.; Shin, J.; et al. SURF: Semi-supervised Reward Learning with Data Augmentation for Preference-based Reinforcement Learning. In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29 April 2022.
30.
Altman, E. Constrained Markov Decision Processes; Routledge: Oxfordshire, UK, 2021.
31.
Morimoto, J.; Doya, K. Robust reinforcement learning. Neural Comput. 2005, 17, 335–359.
32.
Tamar, A.; Glassner, Y.; Mannor, S. Optimizing the CVaR via Sampling. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015.
33.
Rockafellar, R.T.; Uryasev, S. Optimization of conditional value-at-risk. J. Risk 2000, 2, 21–42.
34.
Chow, Y.; Tamar, A.; Mannor, S.; et al. Risk-sensitive and robust decision-making: A CVaR optimization approach. In Proceedings of the Advances in Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; Volume 28.
35.
Tang, Y.C.; Zhang, J.; Salakhutdinov, R. Worst Cases Policy Gradients. arXiv 2019, arXiv:1911.03618.
36.
Kovatchev, B.P.; Otto, E.; Cox, D.; et al. Evaluation of a New Measure of Blood Glucose Variability in Diabetes. Diabetes Care 2006, 29, 2433–2438.
37.
Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the International Conference on Learning Representations 2022, Virtual, 25–29 April 2022.
38.
Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations 2021, Virtual, 3–7 May 2021.
39.
Marling, C.; Bunescu, R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. In Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data Co-Located with 24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain, 29–30 August 2020; Volume 2675, pp. 71–74.

Scilight Press

Author Information

Abstract

Graphical Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us