2605004081
  • Open Access
  • Perspective

Controlling the Instructional Structure in Generative AI Learning Evaluation Experiments

  • Zerong Xie *,   
  • Senze Gao

Received: 03 Dec 2025 | Revised: 26 May 2026 | Accepted: 28 May 2026 | Published: 10 Jun 2026

Abstract

Comparisons of Generative AI (GenAI) learning interventions raise an important methodological consideration regarding instructional structure. The structural fluidity of AI interactions may introduce uncontrolled variability that confounds between-group comparisons. This is a sharp instance of a broader threat to validity in learning evaluation experiments, where the surrounding instructional structure often remains unspecified even when a particular research element is being tested. Confounders such as instructor delivery, the clarity of an opening explanation, or the form of practice may systematically bias inferences about the manipulated variable. This commentary proposes a four-phase framework (Establish Relevance, Technical Details, Intuition, and Practice) for aligning the instructional structure of learning evaluation experiments. Using the teaching of recursion in computer science as a case study, we demonstrate a procedure for standardizing each phase across conditions. We then examine how the probabilistic nature of large language models complicates, without invalidating, structural control in GenAI research, and we identify practical strategies for converting stochastic output variance from a structural confound into bounded measurement noise.

 

References 

  • 1.

    Taber, K.S. Experimental research into teaching innovations: Responding to methodological and ethical challenges. Stud. Sci. Educ. 2019, 55, 69–119. https://doi.org/10.1080/03057267.2019.1658058.

  • 2.

    Century, J.; Rudnick, M.; Freeman, C. A framework for measuring fidelity of implementation: A foundation for shared language and accumulation of knowledge. Am. J. Eval. 2010, 31, 199–218. https://doi.org/10.1177/1098214010366173.

  • 3.

    Yan, L.; Sha, L.; Zhao, L.; et al. Practical and ethical challenges of large language models in education: A systematic scoping review. Br. J. Educ. Technol. 2024, 55, 90–112. https://doi.org/10.1111/bjet.13370.

  • 4.

    Yan, L.; Greiff, S.; Teuber, Z.; et al. Promises and challenges of generative artificial intelligence for human learning. Nat. Hum. Behav. 2024, 8, 1839–1850. https://doi.org/10.1038/s41562-024-02004-5.

  • 5.

    Yusuf, H.; Money, A.; Daylamani-Zad, D. Pedagogical AI conversational agents in higher education: A conceptual framework and survey of the state of the art. Educ. Technol. Res. Dev. 2025, 73, 815–874. https://doi.org/10.1007/s11423-025-10447-4.

  • 6.

    Darvishi, A.; Khosravi, H.; Sadiq, S.; et al. Impact of AI assistance on student agency. Comput. Educ. 2024, 210, 104967. https://doi.org/10.1016/j.compedu.2023.104967.

  • 7.

    Fan, Y.; Tang, L.; Le, H.; et al. Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. Br. J. Educ. Technol. 2025, 56, 489–530. https://doi.org/10.1111/bjet.13544.

  • 8.

    Gu, J.; Yan, Z. Effects of GenAI interventions on student academic performance: A meta-analysis. J. Educ. Comput. Res. 2025, 63, 1460–1492. https://doi.org/10.1177/07356331251349620.

  • 9.

    Zmuda, M.A.; Hatch, M. Scheduling topics for improved student comprehension of recursion. Comput. Educ. 2007, 48, 318–328. https://doi.org/10.1016/j.compedu.2005.02.003.

  • 10.

    Keller, J.M. Development and use of the ARCS model of instructional design. J. Instr. Dev. 1987, 10, 2–10. https://doi.org/10.1007/BF02905780.

  • 11.

    Hulleman, C.S.; Harackiewicz, J.M. Promoting interest and performance in high school science classes. Science 2009, 326, 1410–1412. https://doi.org/10.1126/science.1177067.

  • 12.

    Hulleman, C.S.; Godes, O.; Hendricks, B.L.; et al. Enhancing interest and performance with a utility value intervention. J. Educ. Psychol. 2010, 102, 880–895. https://doi.org/10.1037/a0019506.

  • 13.

    Priniski, S.J.; Hecht, C.A.; Harackiewicz, J.M. Making learning personally meaningful: A new framework for relevance research. J. Exp. Educ. 2018, 86, 11–29. https://doi.org/10.1080/00220973.2017.1380589.

  • 14.

    Harackiewicz, J.M.; Priniski, S.J. Improving student outcomes in higher education: The science of targeted intervention. Annu. Rev. Psychol. 2018, 69, 409–435. https://doi.org/10.1146/annurev-psych-122216-011725.

  • 15.

    Renninger, K.A.; Hidi, S.E. The Power of Interest for Motivation and Engagement; Routledge: Oxfordshire, UK, 2016. https://doi.org/10.4324/9781315771045.

  • 16.

    Bandi, A.; Fellah, A. The essence of recursion: Reduction, delegation, and visualization. J. Comput. Sci. Coll. 2018, 33, 115–123.

  • 17.

    Gordon, A. Teaching recursion using recursively-generated geometric designs. J. Comput. Sci. Coll. 2006, 22, 124–130.

  • 18.

    Xie, Z.; Zhang, C. Use AI in the classroom to bring problems to life. Nature 2025, 644, 338. https://doi.org/10.1038/d41586-025-02571-1.

  • 19.

    Shulman, L.S. Those who understand: Knowledge growth in teaching. Educ. Res. 1986, 15, 4–14. https://doi.org/10.3102/0013189X015002004.

  • 20.

    Shulman, L.S. Knowledge and teaching: Foundations of the new reform. Harv. Educ. Rev. 1987, 57, 1–23. https://doi.org/10.17763/haer.57.1.j463w79r56455411.

  • 21.

    National Research Council. How People Learn: Brain, Mind, Experience, and School, Expanded ed.; National Academies Press: Washington, DC, USA, 2000. https://doi.org/10.17226/9853.

  • 22.

    Wright, L.K.; Fisk, J.N.; Newman, D.L. DNA → RNA: What do students think the arrow means? CBE-Life Sci. Educ. 2014, 13, 338–348. https://doi.org/10.1187/cbe.CBE-13-09-0188.

  • 23.

    Doolan, S.M. An exploratory analysis of source integration in post-secondary L1 and L2 source-based writing. Engl. Specif. Purp. 2021, 62, 128–141. https://doi.org/10.1016/j.esp.2021.01.003.

  • 24.

    Sweller, J.; van Merriënboer, J.J.G.; Paas, F. Cognitive architecture and instructional design: 20 years later. Educ. Psychol. Rev. 2019, 31, 261–292. https://doi.org/10.1007/s10648-019-09465-5.

  • 25.

    Chi, M.T.H.; Feltovich, P.J.; Glaser, R. Categorization and representation of physics problems by experts and novices. Cogn. Sci. 1981, 5, 121–152. https://doi.org/10.1207/s15516709cog0502_2.

  • 26.

    Chi, M.T.H.; Glaser, R.; Farr, M.J. (Eds.). The Nature of Expertise; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988. https://doi.org/10.4324/9781315799681.

  • 27.

    Krathwohl, D.R. A revision of Bloom’s taxonomy: An overview. Theory Into Pract. 2002, 41, 212–218. https://doi.org/10.1207/s15430421tip4104_2.

  • 28.

    Roberts, E. Thinking Recursively; John Wiley & Sons: Hoboken, NJ, USA, 1986.

  • 29.

    Roediger, H.L.; Karpicke, J.D. Test-enhanced learning: Taking memory tests improves long-term retention. Psychol. Sci. 2006, 17, 249–255. https://doi.org/10.1111/j.1467-9280.2006.01693.x.

  • 30.

    Bjork, E.L.; Bjork, R.A. Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In Psychology and the Real World: Essays Illustrating Fundamental Contributions to Society; Gernsbacher, M.A., Pew, R.W., Hough, L.M., et al., Eds.; Worth Publishers: New York, NY, USA, 2011; pp. 56–64.

  • 31.

    Gulikers, J.T.; Bastiaens, T.J.; Kirschner, P.A. A five-dimensional framework for authentic assessment. Educ. Technol. Res. Dev. 2004, 52, 67–86. https://doi.org/10.1007/BF02504676.

  • 32.

    Chiodini, L.; Sorva, J.; Hellas, A.; et al. Two approaches for programming education in the domain of graphics: An experiment. Art Sci. Eng. Program. 2025, 10, 14. https://doi.org/10.22152/programming-journal.org/2025/10/14.

  • 33.

    Turkle, S.; Papert, S. Epistemological pluralism: Styles and voices within the computer culture. Signs J. Women Cult. Soc. 1990, 16, 128–157. https://doi.org/10.1086/494648.

  • 34.

    Ng, D.T.K.; Tan, C.W.; Leung, J.K.L. Empowering student self-regulated learning and science education through ChatGPT: A pioneering pilot study. Br. J. Educ. Technol. 2024, 55, 1328–1353. https://doi.org/10.1111/bjet.13454.

  • 35.

    Steiss, J.; Tate, T.P.; Graham, S.; et al. Comparing the quality of human and ChatGPT feedback of students’ writing. Learn. Instr. 2024, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894.

  • 36.

    Bender, E.M.; Gebru, T.; McMillan-Major, A.; et al. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3–10 March 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 610–623. https://doi.org/10.1145/3442188.3445922.

  • 37.

    Kasneci, E.; Sessler, K.; Küchemann, S.; et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 2023, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274.

  • 38.

    Yan, L.; Wang, H.; Xie, Z.; et al. The impact of artificial intelligence systems and tools on education: Comparative social media analytics of computing versus business students. Systems 2026, 14, 451. https://doi.org/10.3390/systems14040451.

Share this article:
How to Cite
Xie, Z.; Gao, S. Controlling the Instructional Structure in Generative AI Learning Evaluation Experiments. Library, Information & Services 2026, 1 (1), 6.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2026 by the authors.