Implicit Q-Learning for Offline Reinforcement Learning in Blood Glucose Management: A Cross-Dataset Evaluation Study

Bailing Zhang; Yuwei Mi

doi:10.53941/sai.2026.100002

Abstract

Offline reinforcement learning enables learning insulin dosing policies from historical data without risky patient interactions. This study evaluates Implicit Q-Learning (IQL) on three real-world continuous glucose monitoring datasets: OhioT1DM (USA, type 1), ShanghaiT1DM (China, type 1), and ShanghaiT2DM (China, type 2). IQL achieved time in range of 68.5%, 54.3%, and 78.9% respectively. Cross-dataset transfer experiments demonstrated exceptional generalization with over 98% performance retention across geographic regions and diabetes types, suggesting that IQL captures fundamental glucose-insulin dynamics rather than dataset-specific patterns. Ablation studies validated our clinically-motivated reward function design, while sensitivity and robustness analyses confirmed algorithm stability across hyperparameter choices and data quality perturbations.

References

1.
International Diabetes Federation. IDF Diabetes Atlas, 10th ed.; International Diabetes Federation: Brussels, Belgium, 2021.
2.
Dalla Man, C.; Micheletto, F.; Lv, D.; et al. The UVA/PADOVA Type 1 Diabetes Simulator: New Features. J. Diabetes Sci. Technol. 2014, 8, 26–34.
3.
The Diabetes Control and Complications Trial Research Group. The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N. Engl. J. Med. 1993, 329, 977–986.
4.
Battelino, T.; Danne, T.; Bergenstal, R.M.; et al. Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consensus on Time in Range. Diabetes Care 2019, 42, 1593–1603.
5.
American Diabetes Association Professional Practice Committee. Standards of Medical Care in Diabetes—2022. Diabetes Care 2022, 45 (Suppl. 1), S1–S264.
6.
Boiroux, D.; Duun-Henriksen, A.K.; Schmidt, S.; et al. Adaptive control in an artificial pancreas for people with type 1 diabetes. Control Eng. Pract. 2012, 20, 897–908.
7.
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018.
8.
Fox, I.; Lee, J.; Pop-Busui, R.; et al. Deep Reinforcement Learning for Closed-Loop Blood Glucose Control. In Proceedings of the 5th Machine Learning for Healthcare Conference, Virtual, 7–8 August 2020; pp. 508–536.
9.
Lee, S.; Kim, J.; Park, S.W.; et al. Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Validation. IEEE Trans. Biomed. Eng. 2020, 68, 513–524.
10.
Zhu, T.; Li, K.; Herrero, P.; et al. Basal Glucose Control in Type 1 Diabetes Using Deep Reinforcement Learning: An In Silico Validation. IEEE J. Biomed. Health Inform. 2020, 25, 1223–1232.
11.
Hettiarachchi, C.; Malagutti, N.; Nolan, C.; et al. A Reinforcement Learning Based System for Blood Glucose Control without Carbohydrate Estimation in Type 1 Diabetes: In Silico Validation. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 4993–4996.
12.
Tejedor, M.; Woldaregay, A.Z.; Godtliebsen, F. Reinforcement learning application in diabetes blood glucose control: A systematic review. Artif. Intell. Med. 2020, 104, 101836.
13.
Gottesman, O.; Johansson, F.; Komorowski, M.; et al. Guidelines for reinforcement learning in healthcare. Nat. Med. 2019, 25, 16–18.
14.
Yu, C.; Liu, J.; Nemati, S.; et al. Reinforcement Learning in Healthcare: A Survey. ACM Comput. Surv. 2021, 55, 5.
15.
Levine, S.; Kumar, A.; Tucker, G.; et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv 2020, arXiv:2005.01643.
16.
Kumar, A.; Zhou, A.; Tucker, G.; et al. Conservative Q-Learning for Offline Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1179–1191.
17.
Fujimoto, S.; Meger, D.; Precup, D. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; p. 2052–2062.
18.
Kumar, A.; Fu, J.; Tucker, G.; et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Adv. Neural Inf. Process. Syst. 2019, 32, 11761–11771.
19.
Wu, Y.; Tucker, G.; Nachum, O. Behavior Regularized Offline Reinforcement Learning. arXiv 2019, arXiv:1911.11361.
20.
Kostrikov, I.; Nair, A.; Levine, S. Offline Reinforcement Learning with Implicit Q-Learning. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022.
21.
Fu, J.; Kumar, A.; Nachum, O.; et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv 2020, arXiv:2004.07219.
22.
Emerson, H.; Guy, M.; McConville, R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J. Biomed. Inform. 2023, 142, 104376.
23.
Nambiar, A.; Liu, S.; Hopkins, M.; et al. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nat. Med. 2022, 28, 1822–1830.
24.
Taylor, M.E.; Stone, P. Transfer Learning for Reinforcement Learning Domains: A Survey. J. Mach. Learn Res. 2009, 10, 1633–1685.
25.
Eysenbach, B.; Asawa, S.; Chebotar, Y.; et al. Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021.
26.
Zhu, Z.; Lin, K.; Jain, A.K.; et al. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362.
27.
Marling, C.; Bunescu, R. The OhioT1DM Dataset for Blood Glucose Level Prediction: Update 2020. CEUR Workshop Proc. 2020, 2675, 71–74.
28.
Zhao, Q.; Zhu, J.; Shen, X.; et al. Chinese diabetes datasets for data-driven machine learning. Sci. Data 2023, 10, 35.
29.
van Hasselt, H. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems 23, Vancouver, BC, Canada, 6–11 December 2010; pp. 2613–2621.
30.
Freckmann, G.; Pleus, S.; Grady, M.; et al. Measures of Accuracy for Continuous Glucose Monitoring and Blood Glucose Monitoring Devices. J. Diabetes Sci. Technol. 2019, 13, 575–583.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us