Have Large Language Models Improved Research Methodology?

Fredric Narcross; Robert Marks

Abstract

Over the last 30 years, research methods have shifted from manual, library-based approaches to methods supported by digital tools. Current Large Language Models (LLMs), such as ChatGPT, are no longer relegated to literature search and have expanded into research analysis and knowledge discovery. However, their dependability for conducting meaningful research, especially in specialized, interdisciplinary areas, has not been thoroughly assessed. This study evaluated whether contemporary LLMs can reproduce the research outcomes of a fully documented human study: a 1991 article that identified dermatophytosis (ringworm) in historical fine art. This niche, interdisciplinary topic was selected as a deliberate stress test of LLM capabilities in the research frontier, where cross-domain synthesis and deep domain knowledge are required. Ten commercially available LLMs were systematically tested using two prompt conditions: a basic factual query and a complex motivational prompt designed to elicit human-level research performance. LLM responses were graded on a scale from 0 to 3 based on eight characteristics of the artwork and three factors from the benchmark article. Four categories, which included complete fabrication, misattribution, embellishment, and overclaiming, were used to identify, count, and categorize hallucinations. The original paper, digital collections, and museum databases were all examined as part of the verification process utilized for this classification. Statistical comparisons used the Wilcoxon signed-rank test and Fisher’s exact test. No LLM rediscovered any of the seven artworks identified in the original article. 3 out of 10 LLMs (30%) produced incorrect information in response to the simple prompt (M = 0.40, SD = 0.70). 9 of 10 LLMs (90%) produced incorrect information in response to the complex prompt, resulting in 48 total instances (M = 4.80, SD = 4.52). The rise was statistically significant (effect size r = 0.91; Fisher’s exact test p = 0.020; Wilcoxon W = 0.0, p = 0.004). Perplexity Pro Deep Research, despite providing the most detailed etiological information (scoring 3 on all three article-level characteristics), also produced the most hallucinations (n = 17). LLMs consistently fabricated plausible-sounding content rather than acknowledging uncertainty. In the specialized interdisciplinary domain tested here, current LLMs proved unreliable as autonomous research agents. Between the two prompt conditions tested, hallucination rates rose twelvefold when the prompt moved from a basic factual query to a complex motivational prompt that explicitly demanded a detailed report. This pattern is consistent with prioritizing output completion above factual accuracy; Reinforcement Learning from Human Feedback (RLHF) is discussed as a plausible explanation for the pattern, but not as a demonstrated mechanism. LLMs can be helpful as research tools when experts in the field use them to verify the results. However, they should not be viewed as independent research agents. It is also relevant for researchers to keep prompts simple and limited in scope to produce more reliable results and to view confident responses with skepticism. Independent verification is necessary.

References

1.
Marks, R. Dermatophytoses in Art. J. Med. Vet. Mycol. 1991, 29, 1–8. https://doi.org/10.1080/02681219180000021.
2.
ChatGPT 2025. Available online: https://chatgpt.com/ (accessed on 2 March 2026).
3.
Mishra, T.; Sutanto, E.; Rossanti, R.; et al. Use of large language models as artificial intelligence tools in academic research and publishing among global clinical researchers. Sci. Rep. 2024, 14, 31672. https://doi.org/10.1038/s41598-024-81370-6.
4.
Huang, L.; Yu, W.; Ma, W.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv 2023, arXiv:2311.05232.
5.
Ji, Z.; Lee, N.; Frieske, R.; et al. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 2023, 55, 1–38. https://doi.org/10.1145/3571730.
6.
Ouyang, L.; Wu, J.; Jiang, X.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744.
7.
Christiano, P.F.; Leike, J.; Brown, T.; et al. Deep reinforcement learning from human preferences. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017.
8.
Salecha, A.; Ireland, M.E.; Subrahmanya, S.; et al. Large language models display human-like social desirability biases in Big Five personality surveys. PNAS Nexus 2024, 3, pgae533. https://doi.org/10.1093/pnasnexus/pgae533.
9.
Qin, J.; Liu, C.; Cheng, S.; et al. Freeze the Backbones: a Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-Training. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 14–19 April 2024.
10.
Liu, C.; Ouyang, C.; Cheng, S.; et al. G2D: From global to dense radiography representation learning via vision-language pre-training. Adv. Neural Inf. Process. Syst. 2024, 37, 14751–14773.
11.
Banerjee, S.; Agarwal, A.; Singla, S.; et al. LLMs Will Always Hallucinate, and We Need to Live with This. arXiv 2024, arXiv:2409.05746.
12.
Abdurahman, S.; Salkhordeh Z, A.; Moore, A.K.; et al. A Primer for Evaluating Large Language Models in Social-Science Research. Adv. Methods Pract. Psychol. Sci. 2025, 8, 25152459251325174. https://doi.org/10.1177/25152459251325174.
13.
QuestionPro. Research Process Steps: What They Are + How to Follow. Available online: https://www.questionpro.com/blog/research-process-steps/ (accessed on 10 March 2026).
14.
Alphonse, N. The Main Stages of the Research Process—A Review of the Literature. Int. J. Res. Rev. 2023, 10, 671–675.
15.
Library of Congress. How to Use the Card Catalog. 2020. Available online: https://guides.loc.gov/card-catalog/using-the-card-catalog (accessed on 10 March 2026).
16.
inetAsia. The Importance of Search Engines. 2025. Available online: https://www.inetasia.com/articles/the-importance-of-search-engines/ (accessed on 11 March 2026).
17.
Wang, J. Streamline Your Research Using Academic Search Engines. Times Higher Education. 2022. Available online: https://www.timeshighereducation.com/campus/streamline-your-research-using-academic-search-engines (accessed on 3 March 2026).
18.
Jirousek, C. The Evolution of the Idea of Art. Art, Design, and Visual Thinking. 1995. Available online: http://char.txa.cornell.edu/ART/FINEART/EVOLIDEA/evolidea.htm (accessed on 19 March 2026).
19.
Wallentine, A. How Artists Have Explored and Understood the Human Body through Time, Hyerallergic. 2022. Available online: https://hyperallergic.com/716611/flesh-and-bones-getty-research-institute/#:~:text=Posted%20inArt-,How%20Artists%20Have%20Explored%20and%20Understood%20the%20Human%20Body%20Through,Getty%20Research%20Institute%2C%20Los%20Angeles) (accessed on 8 March 2026).
20.
What Is Fine Art? Definition, History, and Examples. Grove Gallery. 2025. Available online: https://grovegallery.com/blogs/articles/what-is-fine-art-definition-history-and-examples?srsltid=AfmBOoonDLLAqTlFTfJeFELro_a2PSNcmoY05BAgGFn2O-uuSk2vj7NZ (accessed on 8 March 2026).
21.
Fungi. 2019. Available online: https://byjus.com/biology/kingdom-fungi/ (accessed on 7 March 2026).
22.
Moskaluk, A.; VandeWoude, S. Current Topics in Dermatophyte Classification and Clinical Diagnosis. Pathogens 2022, 11, 957.
23.
Hon, A. Mycology of Dermatophyte Infections. 2023. Available online: https://dermnetnz.org/topics/mycology-of-dermatophyte-infections (accessed on 7 March 2026).
24.
Kovitwanichkanont, T.; Chong, A. Superficial Fungal Infections. Aust. J. Gen. Pract. 2019, 48, 706–711.
25.
Pepin, J. The Artistry of Jacques Pepin. 2025. Available online: https://jacquespepinart.com/art/the-human-form-in-art-history (accessed on 7 March 2026).
26.
Svartefoss, S.; Jungblut, J.; Aksnes, D.; et al. Explaining research performance: Investigating the importance of motivation. SN Soc. Sci. 2024, 4, 105. https://doi.org/10.1007/s43545-024-00895-9.
27.
Streefkerk, R. Inductive vs. Deductive Research Approach. Steps & Examples. 2023. Available online: https://www.scribbr.com/methodology/inductive-deductive-reasoning/#:~:text=Published%20on%20April%2018%2C%202019,at%20testing%20an%20existing%20theory (accessed on 7 March 2026).
28.
OpenAI’S O3 Scores 136 on the Mensa Norway Test, Surpassing 98% of the Human Population. CryptoSlate. 2025. Available online: https://cryptoslate.com/openais-o3-scores-136-on-mensa-norway-test-surpassing-98-of-human-population/ (accessed on 7 March 2026).
29.
High, M. Why Elon Musk Claims Grok-3 Is the World’S ‘Smartest AI’. 2025. Available online: https://aimagazine.com/articles/is-grok-3-really-the-smartest-ai-on-earth (accessed on 7 March 2026).
30.
Chat & Ask AI, Research Assistant. 2025. Available online: https://askaichat.app/assistant?assistant=research_assistant (accessed on 7 March 2026).
31.
DeepSeek 2025. Available online: https://deepseek.ai/ (accessed on 1 March 2026).
32.
Gemini Pro 2.5. 2025. Available online: https://aistudio.google.com/app/prompts/new_chat?model=gemini-2.5-pro/ (accessed on 23 March 2026).
33.
Granite Enable Thinking. 2025. Available online: https://www.ibm.com/granite/playground/ (accessed on 12 March 2026).
34.
Grok 3. 2025. Available online: https://grok.com/ (accessed on 15 March 2026).
35.
Llama 3. 2025. Available online: https://www.llama.com/ (accessed on 1 March 2026).
36.
Andaluca, Santa Caridad Hospital, Available online: https://en.andalucia.org/listing/santa-caridad-hospital/15961101/ (accessed on 15 March 2026).
37.
Murillo, B. Saint Thomas of Villanova Giving Alms, Museum of Fine Arts in Seville, Spain. 1668. Available online: https://www.museosdeandalucia.es/web/museodebellasartesdesevilla/obras-singulares/-/asset_publisher/GRnu6ntjtLfp/content/santo-tomas-de-villanueva-dando-limosnas?redirect=%2Fweb%2Fmuseodebellasartesdesevilla%2Fobras-singulares%3Fp_p_id%3D101_INSTANCE_GRnu6ntjtLfp%26p_p_lifecycle%3D0%26p_p_state%3Dnormal%26p_p_mode%3Dview%26p_p_col_id%3Dcolumn-2%26p_p_col_pos%3D1%26p_p_col_count%3D2%26_101_INSTANCE_GRnu6ntjtLfp_delta%3D6%26_101_INSTANCE_GRnu6ntjtLfp_keywords%3D%26_101_INSTANCE_GRnu6ntjtLfp_advancedSearch%3Dfalse%26_101_INSTANCE_GRnu6ntjtLfp_andOperator%3Dtrue%26p_r_p_564233524_resetCur%3Dfalse%26_101_INSTANCE_GRnu6ntjtLfp_cur%3D7&inheritRedirect=true (accessed on 30 March 2026).
38.
Goya, F. La Boda (The Wedding). Museo del Prado, Madrid, España. 1792. Available online: https://www.museodelprado.es/coleccion/obra-de-arte/la-boda/6340b840-5e11-49cd-9151-0c1fdd240389 (accessed on 4 March 2026). (In Spanish)
39.
Pils, I. The Prayer of the Teig Children. Musee de l’Assistance Publique, Hopitaux de Paris, Paris, France. 1853. Available online: https://en.muzeo.com/art-print/the-prayer-of-the-children-suffering-from-ringworm/isidore-pils (accessed on 7 March 2026).
40.
USEUM. Available online: https://www.useum.org/artwork/The-Regents-of-the-Leper-colony-in-Amsterdam-in-1649-Ferdinand-Bol (accessed on 11 March 2026).
41.
Vinkeles, R. De Vier Regenten van Het Leprozenhuis te Amsterdam, Rijksmuseum, Amsterdam, The Netherlands. 1769. Available online: https://www.rijksmuseum.nl/nl/collectie/object/De-regenten-van-het-Leprozenhuis-1649--c6c0ad57afbd895d93ae31d74dc3cdd3 (accessed on 13 March 2026).
42.
Ministry of Culture, POP, Open Heritage Foundation. Available online: https://pop.culture.gouv.fr/notice/palissy/PM28000982 (accessed on 15 March 2026).
43.
Spanish Arts, El Joven Mendigo, Available online: https://www.spanish-art.org/spanish-painting-el-joven-mendigo.html (accessed on 15 March 2026).
44.
Mukhopadhyay, A.K. A Historical Note on the Evolution of “Ringworm”. Indian J. Dermatol. Venereol. Leprol. 2019, 85, 125–128. https://doi.org/10.25259/IJDVL_123_2019.
45.
Historical Illustrations of Skin Disease: Selections from the New Sydenham Society Atlas 1860–1884. Yale University Library. 2022. Available online: https://onlineexhibits.library.yale.edu/s/skin-diseases/page/home (accessed on 23 March 2026).
46.
OpenAI. OpenAI o3 and o4-mini System Card. 16 April 2025. Available online: https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf (accessed on 20 March 2026).

Scilight Press

Author Information

Abstract

Graphical Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us