2602003161
  • Open Access
  • Article

Implicit Q-Learning for Offline Reinforcement Learning in Blood Glucose Management: A Cross-Dataset Evaluation Study

  • Bailing Zhang 1,*,   
  • Yuwei Mi 2

Received: 30 Dec 2025 | Revised: 02 Feb 2026 | Accepted: 28 Feb 2026 | Published: 16 Mar 2026

Abstract

Offline reinforcement learning enables learning insulin dosing policies from historical data without risky patient interactions. This study evaluates Implicit Q-Learning (IQL) on three real-world continuous glucose monitoring datasets: OhioT1DM (USA, type 1), ShanghaiT1DM (China, type 1), and ShanghaiT2DM (China, type 2). IQL achieved time in range of 68.5%, 54.3%, and 78.9% respectively. Cross-dataset transfer experiments demonstrated exceptional generalization with over 98% performance retention across geographic regions and diabetes types, suggesting that IQL captures fundamental glucose-insulin dynamics rather than dataset-specific patterns. Ablation studies validated our clinically-motivated reward function design, while sensitivity and robustness analyses confirmed algorithm stability across hyperparameter choices and data quality perturbations.

References 

  • 1.

    International Diabetes Federation. IDF Diabetes Atlas, 10th ed.; International Diabetes Federation: Brussels, Belgium, 2021.

  • 2.

    Dalla Man, C.; Micheletto, F.; Lv, D.; et al. The UVA/PADOVA Type 1 Diabetes Simulator: New Features. J. Diabetes Sci. Technol. 2014, 8, 26–34.

  • 3.

    The Diabetes Control and Complications Trial Research Group. The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N. Engl. J. Med. 1993, 329, 977–986.

  • 4.

    Battelino, T.; Danne, T.; Bergenstal, R.M.; et al. Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consensus on Time in Range. Diabetes Care 2019, 42, 1593–1603.

  • 5.

    American Diabetes Association Professional Practice Committee. Standards of Medical Care in Diabetes—2022. Diabetes Care 2022, 45 (Suppl. 1), S1–S264.

  • 6.

    Boiroux, D.; Duun-Henriksen, A.K.; Schmidt, S.; et al. Adaptive control in an artificial pancreas for people with type 1 diabetes. Control Eng. Pract. 2012, 20, 897–908.

  • 7.

    Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018.

  • 8.

    Fox, I.; Lee, J.; Pop-Busui, R.; et al. Deep Reinforcement Learning for Closed-Loop Blood Glucose Control. In Proceedings of the 5th Machine Learning for Healthcare Conference, Virtual, 7–8 August 2020; pp. 508–536.

  • 9.

    Lee, S.; Kim, J.; Park, S.W.; et al. Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Validation. IEEE Trans. Biomed. Eng. 2020, 68, 513–524.

  • 10.

    Zhu, T.; Li, K.; Herrero, P.; et al. Basal Glucose Control in Type 1 Diabetes Using Deep Reinforcement Learning: An In Silico Validation. IEEE J. Biomed. Health Inform. 2020, 25, 1223–1232.

  • 11.

    Hettiarachchi, C.; Malagutti, N.; Nolan, C.; et al. A Reinforcement Learning Based System for Blood Glucose Control without Carbohydrate Estimation in Type 1 Diabetes: In Silico Validation. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 4993–4996.

  • 12.

    Tejedor, M.; Woldaregay, A.Z.; Godtliebsen, F. Reinforcement learning application in diabetes blood glucose control: A systematic review. Artif. Intell. Med. 2020, 104, 101836.

  • 13.

    Gottesman, O.; Johansson, F.; Komorowski, M.; et al. Guidelines for reinforcement learning in healthcare. Nat. Med. 2019, 25, 16–18.

  • 14.

    Yu, C.; Liu, J.; Nemati, S.; et al. Reinforcement Learning in Healthcare: A Survey. ACM Comput. Surv. 2021, 55, 5.

  • 15.

    Levine, S.; Kumar, A.; Tucker, G.; et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv 2020, arXiv:2005.01643.

  • 16.

    Kumar, A.; Zhou, A.; Tucker, G.; et al. Conservative Q-Learning for Offline Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1179–1191.

  • 17.

    Fujimoto, S.; Meger, D.; Precup, D. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; p. 2052–2062.

  • 18.

    Kumar, A.; Fu, J.; Tucker, G.; et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Adv. Neural Inf. Process. Syst. 2019, 32, 11761–11771.

  • 19.

    Wu, Y.; Tucker, G.; Nachum, O. Behavior Regularized Offline Reinforcement Learning. arXiv 2019, arXiv:1911.11361.

  • 20.

    Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022.

  • 21.

    Fu, J.; Kumar, A.; Nachum, O.; et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv 2020, arXiv:2004.07219.

  • 22.

    Emerson, H.; Guy, M.; McConville, R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J. Biomed. Inform. 2023, 142, 104376.

  • 23.

    Nambiar, A.; Liu, S.; Hopkins, M.; et al. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nat. Med. 2022, 28, 1822–1830.

  • 24.

    Taylor, M.E.; Stone, P. Transfer Learning for Reinforcement Learning Domains: A Survey. J. Mach. Learn Res. 2009, 10, 1633–1685.

  • 25.

    Eysenbach, B.; Asawa, S.; Chebotar, Y.; et al. Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021.

  • 26.

    Zhu, Z.; Lin, K.; Jain, A.K.; et al. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362.

  • 27.

    Marling, C.; Bunescu, R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. CEUR Workshop Proc. 2020, 2675, 71–74.

  • 28.

    Zhao, Q.; Zhu, J.; Shen, X.; et al. Chinese diabetes datasets for data-driven machine learning. Sci. Data 2023, 10, 35.

  • 29.

    van Hasselt, H. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems 23, Vancouver, BC, Canada, 6–11 December 2010; pp. 2613–2621.

  • 30.

    Freckmann, G.; Pleus, S.; Grady, M.; et al. Measures of Accuracy for Continuous Glucose Monitoring and Blood Glucose Monitoring Devices. J. Diabetes Sci. Technol. 2019, 13, 575–583.

Share this article:
How to Cite
Zhang, B.; Mi, Y. Implicit Q-Learning for Offline Reinforcement Learning in Blood Glucose Management: A Cross-Dataset Evaluation Study. Sensors and AI 2026, 2 (1), 17–32. https://doi.org/10.53941/sai.2026.100002.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.