Bias-Corrected RMSD Item Fit Statistic via SIMEX

Alexander Robitzsch

doi:10.53941/ams.2026.100010

Abstract

This study evaluates the simulation extrapolation (SIMEX) method as a bias-correction approach for the distribution-weighted and difficulty-weighted root mean square deviation (RMSD) item fit statistics. The results indicate that SIMEX reduces the positive bias of the original RMSD statistic and can be applied in the context of differential item functioning (DIF) analysis. Although the SIMEX-based RMSD statistics showed slightly greater bias than previously proposed analytic corrections, they yielded lower RMSE for items with DIF. For items without DIF, the analytic bias-correction methods performed better with respect to both bias and root mean square error (RMSE). An empirical example further showed that the SIMEX-based and analytically bias-corrected RMSD statistics produced very similar estimates.

References

1.
Bock, R.D.; Moustaki, I. 15 Item Response Theory in a General Framework. In Handbook of Statistics; Rao, C.R.; Sinharay, S., Eds.; Wiley: Hoboken, NY, USA, 2007; pp. 469–513. https://doi.org/10.1016/S0169-7161(06)26015-2.
2.
Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NY, USA, 2021. https://doi.org/10.1002/9781119716723.
3.
Chen, Y.; Li, X.; Liu, J.; et al. Item Response Theory—A Statistical Framework for Educational and Psychological Measurement. Stat. Sci. 2025, 40, 167–194. https://doi.org/10.1214/23-STS896.
4.
Yen, W.M.; Fitzpatrick, A.R. Item Response Theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: London, UK, 2006; pp. 111–154.
5.
Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. https://doi.org/10.1201/b16061.
6.
van der Linden, W.J. Unidimensional Logistic Response Models. In Handbook of Item Response Theory; van der Linden, W.J., Ed.; CRC Press: Boca Raton, UK, 2016; pp. 11–30. https://doi.org/10.1201/9781315119144.
7.
von Davier, M. Why Sum Scores May Not Tell Us All about Test Takers. Newborn Infant Nurs. Rev. 2010, 10, 27–36. https://doi.org/10.1053/j.nainr.2009.12.011.
8.
Birnbaum, A. Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479.
9.
Glas, C.A.W. Maximum-Likelihood Estimation. In Handbook of Item Response Theory; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. https://doi.org/10.1201/b19166-11.
10.
Bock, R.D.; Aitkin, M. Marginal Maximum Likelihood Estimation of Item Parameters: Application of an EM Algorithm. Psychometrika 1981, 46, 443–459. https://doi.org/10.1007/BF02293801.
11.
Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. https://doi.org/10.4324/9780203357811.
12.
Penfield, R.D.; Camilli, G. Differential Item Functioning and Item Bias. In Handbook of Statistics; Rao, C.R., Sinharay, S., Eds.; Wiley: Hoboken, NY, USA, 2007; pp. 125–167. https://doi.org/10.1016/S0169-7161(06)26005-X.
13.
Mellenbergh, G.J. Item Bias and Item Response Theory. Int. J. Educ. Res. 1989, 13, 127–143. https://doi.org/10.1016/0883-0355(89)90002-5.
14.
Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. https://doi.org/10.4324/9780203821961.
15.
Held, L.; Sabanes Bove, D. Applied Statistical Inference; Springer: Berlin, Germany, 2014. https://doi.org/10.1007/978-3-642-37887-4.
16.
Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. https://doi.org/10.1017/CBO9780511802256.
17.
White, H. Maximum Likelihood Estimation of Misspecified Models. Econometrica 1982, 50, 1–25. https://doi.org/10.2307/1912526.
18.
Douglas, J.; Cohen, A. Nonparametric Item Response Function Estimation for Assessing Parametric Model Fit. Appl. Psychol. Meas. 2001, 25, 234–243. https://doi.org/10.1177/01466210122032046.
19.
van Rijn, P.W.; Sinharay, S.; Haberman, S.J.; et al. Assessment of Fit of Item Response Theory Models Used in Large-Scale Educational Survey Assessments. Large-Scale Assess. Educ. 2016, 4, 10. https://doi.org/10.1186/s40536-016-0025-3.
20.
Sinharay, S.; Haberman, S.J. How Often Is the Misfit of Item Response Theory Models Practically Significant? Educ. Meas. 2014, 33, 23–35. https://doi.org/10.1111/emip.12024.
21.
Sinharay, S.; Monroe, S. Assessment of Fit of Item Response Theory Models: A Critical Review of the Status Quo and Some Future Directions. Br. J. Math. Stat. Psychol. 2025, 78, 711–733. https://doi.org/10.1111/bmsp.12378.
22.
Swaminathan, H.; Hambleton, R.K.; Rogers, H.J. 21 Assessing the Fit of Item Response Theory Models. In Handbook of Statistics; Rao, C.R., Sinharay, S., Eds.; Wiley: Hoboken, NY, USA, 2007; pp. 683–718. https://doi.org/10.1016/S0169-
7161(06)26021-8.
23.
Buchholz, J.; Hartig, J. Comparing Attitudes Across Groups: An IRT-Based Item-Fit Statistic for the Analysis of Measurement Invariance. Appl. Psychol. Meas. 2019, 43, 241–250. https://doi.org/10.1177/0146621617748323.
24.
Buchholz, J.; Hartig, J. Measurement Invariance Testing in Questionnaires: A Comparison of Three Multigroup-CFA and IRT-Based Approaches. Psychol. Test Assess. Model. 2020, 62, 29–53.
25.
Khorramdel, L.; Shin, H.J.; von Davier, M. GDM Software mdltm Including Parallel EM Algorithm. In Handbook of Diagnostic Classification Models; von Davier, M., Lee, Y.S., Eds.; Springer: Cham, Switzerland, 2019; pp. 603–628. https://doi.org/10.1007/978-3-030-05584-4 30.
26.
Kunina-Habenicht, O.; Rupp, A.A.; Wilhelm, O. A Practical Illustration of Multidimensional Diagnostic Skills Profiling: Comparing Results from Confirmatory Factor Analysis and Diagnostic Classification Models. Stud. Educ. Eval. 2009, 35, 64–70. https://doi.org/10.1016/j.stueduc.2009.10.003.
27.
Joo, S.H.; Khorramdel, L.; Yamamoto, K.; et al. Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross-Country Comparability of Cognitive Items. Educ. Meas. 2021, 40, 37–48. https://doi.org/10.1111/emip.12404.
28.
Joo, S.; Ali, U.; Robin, F.; et al. Impact of Differential Item Functioning on Group Score Reporting in the Context of Large-Scale Assessments. Large-Scale Assess. Educ. 2022, 10, 18. https://doi.org/10.1186/s40536-022-00135-7.
29.
Robitzsch, A.; Ludtke, O. A Review of Different Scaling Approaches Under Full Invariance, Partial Invariance, and Noninvariance for Cross-Sectional Country Comparisons in Large-Scale Assessments. Psychol. Test Assess. Model. 2020, 62, 233–279.
30.
Robitzsch, A. Statistical Properties of Estimators of the RMSD Item Fit Statistic. Foundations 2022, 2, 488–503. https://doi.org/10.3390/foundations2020032.
31.
Robitzsch, A. Bias-Corrected Root Mean Square Deviation Estimators. Foundations 2025, 5, 36. https://doi.org/10.3390/foundations5040036.
32.
Robitzsch, A. Confidence Interval Estimation for RMSD and MD Item Fit Statistics. Univ. J. Appl. Math. 2026, 14, 9–21. https://doi.org/10.13189/ujam.2026.140102.
33.
Sueiro, M.J.; Abad, F.J. Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches. Educ. Psychol. Meas. 2011, 71, 834–848. https://doi.org/10.1177/0013164410393238.
34.
Tijmstra, J.; Bolsinova, M.; Liaw, Y.L.; et al. Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries. J. Educ. Meas. 2020, 57, 566–583. https://doi.org/10.1111/jedm.12263.
35.
von Davier, M.; Bezirhan, U. A Robust Method for Detecting Item Misfit in Large Scale Assessments. Educ. Psychol. Meas. 2023, 83, 740–765. https://doi.org/10.1177/00131644221105819.
36.
Joo, S.; Valdivia, M.; Svetina Valdivia, D.; et al. Alternatives to Weighted Item Fit Statistics for Establishing Measurement Invariance in Many Groups. J. Educ. Behav. Stat. 2024, 49, 465–493. https://doi.org/10.3102/10769986231183326.
37.
Robitzsch, A. Comparing Weighted RMSD, Weighted MD, Infit, and Outfit Item Fit Statistics Under Uniform Differential Item Functioning. Mathematics 2025, 13, 3752. https://doi.org/10.3390/math13233752.
38.
Carroll, R.J.; K¨uchenhoff, H.; Lombard, F.; et al. Asymptotics for the SIMEX Estimator in Nonlinear Measurement Error Models. J. Am. Stat. Assoc. 1996, 91, 242–250. https://doi.org/10.2307/2291401.
39.
Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. https://doi.org/10.1007/978-1-4614-4818-1.
40.
Kondratek, B. Item-Fit Statistic Based on Posterior Probabilities of Membership in Ability Groups. Appl. Psychol. Meas. 2022, 46, 462–478. https://doi.org/10.1177/01466216221108061.
41.
Cook, J.R.; Stefanski, L.A. Simulation-Extrapolation Estimation in Parametric Measurement Error Models. J. Am. Stat. Assoc. 1994, 89, 1314–1328. https://doi.org/10.2307/2290994.
42.
Stefanski, L.A.; Cook, J.R. Simulation-Extrapolation: The Measurement Error Jackknife. J. Am. Stat. Assoc. 1995, 90, 1247–1256. https://doi.org/10.2307/2291515.
43.
Sevilimedu, V.; Yu, L. Simulation Extrapolation Method for Measurement Error: A Review. Stat. Methods Med. Res. 2022, 31, 1617–1636. https://doi.org/10.1177/09622802221102619.
44.
Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; et al. Measurement Error in Nonlinear Models: A Modern Perspective; Chapman and Hall/CRC: Boca Raton, FL, USA, 2006. https://doi.org/10.1201/9781420010138.
45.
Lederer, W.; K¨uchenhoff, H. A Short Introduction to the SIMEX and MCSIMEX. R News 2006, 6, 26–31.
46.
Buonaccorsi, J.P. Measurement Error: Models, Methods, and Applications; CRC Press: Boca Raton, FL, USA, 2010. https://doi.org/10.1201/9781420066586.
47.
Jackel, P. Monte Carlo Methods in Finance; Wiley: New York, NY, USA, 2002.
48.
R Core Team. R: A Language and Environment for Statistical Computing, 2026. Available online: https://www.R-project.org (accessed on 12 March 2026).
49.
Hofert, M.; Lemieux, C. qrng: (Randomized) Quasi-Random Number Generators, 2026. R package Version 0.0-11. https://cran.r-project.org/web/packages/qrng/index.html (accessed on 22 January 2026).
50.
Robitzsch, A. sirt: Supplementary Item Response Theory Models, 2025. R Package Version 4.2-133. https://cran.r-roject.org/web/packages/sirt/index.html (accessed on 27 September 2025).
51.
Zeileis, A.; Strobl, C.; Wickelmaier, F.; et al. psychotools: Psychometric Modeling Infrastructure, 2026. R Package Version 0.7-6. https://cran.r-project.org/web/packages/psychotools/index.html (accessed on 11 February 2026).
52.
Zeileis, A. Examining Exams Using Rasch Models and Assessment of Measurement Invariance. Austrian J. Stat. 2025, 54, 9–26. https://doi.org/10.17713/ajs.v54i3.2055.
53.
Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules, 2025. R Package Version 4.3-25. https://cran.r-project.org/web/packages/TAM/index.html (accessed on 28 August 2025).
54.
Muraki, E. A Generalized Partial Credit Model: Application of an EM Algorithm. Appl. Psychol. Meas. 1992, 16, 159–176. https://doi.org/10.1177/014662169201600206.
55.
Forero, C.G.; Maydeu-Olivares, A. Estimation of IRT Graded Response Models: Limited Versus Full Information Methods. Psychol. Methods 2009, 14, 275–299. https://doi.org/10.1037/a0015825.
56.
OECD. PISA 2022. Technical Report; OECD: Paris, France, 2024.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us