2505000685
  • Open Access
  • Review
A Contemporary Survey of Large Language Model Assisted Program Analysis
  • Jiayimei Wang 1,   
  • Tao Ni 1,   
  • Wei-Bin Lee 2, 3,   
  • Qingchuan Zhao 1, *

Received: 06 Feb 2025 | Revised: 20 Apr 2025 | Accepted: 28 Apr 2025 | Published: 26 May 2025

Abstract

The increasing complexity of software systems has driven significant advancements in program analysis, as traditional methods are unable to meet the demands of modern software development. To address these limitations, deep learning techniques, particularly Large Language Models (LLMs), have gained attention due to their context-aware capabilities in code comprehension. Recognizing the potential of LLMs, researchers have extensively explored their application in program analysis since their introduction. Despite existing surveys on LLM applications in cybersecurity, comprehensive reviews specifically addressing their role in program analysis remain scarce. This survey reviews the application of LLMs in program analysis, categorizing existing work into static, dynamic, and hybrid approaches. We also identify current research hotspots, such as LLM integration in automated vulnerability detection and code analysis, common challenges like model interpretability and training data limitations, and future directions, including using LLMs to convert dynamic analysis tasks into static ones. This survey aims to demonstrate the potential of LLMs in advancing program analysis practices and offer actionable insights for security researchers seeking to enhance detection frameworks or develop domain-specific models.

References 

  • 1.
    Mens, T.; Wermelinger, M.; Ducasse, S.; et al. Challenges in software evolution. In Proceedings of the Eighth International  Workshop on Principles of Software Evolution (IWPSE’05), Lisbon, Portugal, 5–6 September 2005; pp. 13–22.
  • 2.
    Li, H.; Kwon, H.; Kwon, J.; et al. Clorifi: software vulnerability discovery using code clone verification. Concurr. Comput. Pract. Exp. 2016, 28, 1900–1917.
  • 3.
    Aggarwal, A.; Jalote, P. Integrating static and dynamic analysis for detecting vulnerabilities. In Proceedings of the 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Chicaco, IL, USA, 17–21  September 2006; Volume 1, pp. 343–350.
  • 4.
    Stechyshyn, A. Security Vulnerabilities in Financial Institutions. Master’s Thesis, Utica College, Utica, NY, USA, 2015.
  • 5.
    Williams, P.A.; Woodward, A.J. Cybersecurity vulnerabilities in medical devices: A complex environment and multifaceted  problem. Med. Devices Evid. Res. 2015, 8, pp. 305–316.
  • 6.
    Mathas, C.M.; Vassilakis, C.; Kolokotronis, N.; et al. On the design of iot security: Analysis of software vulnerabilities for  smart grids. Energies 2021, 14, 2818.
  • 7.
    Atanasov, A.; Chenane, R. Security Vulnerabilities in Next Generation Air Transportation System. Master’s Thesis, Gothenburg, Sweden, 2014.
  • 8.
    Shahintash, A.; Hajiyev, Y. e-Government Services Vulnerability. In Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), Astana, Kazakhstan, 15–17 October 2014; pp. 1–5.
  • 9.
    Goseva-Popstojanova, K.; Perhinschi, A. On the capability of static code analysis to detect security vulnerabilities. Inf. Softw. Technol. 2015, 68, 18–33.
  • 10.
    Siddiqui, S.; Metta, R.; Madhukar, K. Towards multi-language static code analysis. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), Florence, Italy, 9–12 October 2023; pp. 81–82.
  • 11.
    Wang, J.; Huang, M.; Nie, Y.; et al. Static analysis of source code vulnerability using machine learning techniques: A survey. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 28–31 May 2021; pp. 76–86.
  • 12.
    Chernis, B.; Verma, R.M. Machine learning methods for software vulnerability detection. In Proceedings of the Fourth ACMInternational Workshop on Security and Privacy Analytics, Tempe, AZ, USA, 19–21 March 2018.
  • 13.
    Pradel, M.; Sen, K. Deepbugs: A learning approach to name-based bug detection. In Proceedings of the ACM on Programming Languages, Boston, MA, USA, 4–9 November 2018; Volume 2, pp. 1–25.
  • 14.
    Zou, D.; Zhu, Y.; Xu, S.; et al. Interpreting deep learning-based vulnerability detector predictions based on heuristic searching. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2021, 30, 1–31.
  • 15.
    Ye, G.; Tang, Z.; Wang, H.; et al. Deep program structure modeling through multi-relational graph-based learning. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, Atlanta, GA, USA, 3–7 October 2020; pp. 111–123.
  • 16.
    Ahmed, T.; Devanbu, P. Few-shot training llms for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Rochester, MI, USA, 10–14 October 2022.
  • 17.
    Sharma, P.; Dash, B. Impact of big data analytics and chatgpt on cybersecurity. In Proceedings of the 2023 4th International Conference on Computing and Communication Systems (I3CS), Shillong, India, 16–18 March 2023; pp. 1–6.
  • 18.

    Wögerer, W. A Survey of Static Program Analysis Techniques; Technische Universit¨ at Wien: Vienna, Austria, 2005.

  • 19.
    Gosain, A.; Sharma, G. A survey of dynamic program analysis techniques and tools. In Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA), Bhubaneswar, India, 14–15 November 2014; pp. 113–122.
  • 20.
    Baldoni, R.; Coppa, E.; D’elia, D.C.; et al. A survey of symbolic execution techniques. ACM Comput. Surv. 2018, 51, 1–39. https://doi.org/10.1145/3182657.
  • 21.
    Xue, H.; Sun, S.; Venkataramani, G.; et al. Machine learning-based analysis of program binaries: A comprehensive study. IEEE Access 2018, 7, 65889–65912.
  • 22.
    Zhou, X.; Cao, S.; Sun, X.; et al. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. arXiv 2024, arXiv:2404.02525.
  • 23.
    Chen, Y.; Cui, M.; Wang, D.; et al. A survey of large language models for cyber threat detection. Comput. Secur. 2024, 145, 104016.
  • 24.
    Wang, J.; Huang, Y.; Chen, C.; et al. Software testing with large language models: Survey, landscape, and vision. IEEE Trans. Softw. Eng. 2024, 50, 911–936.
  • 25.
    Al-Karaki, J.; Khan, M.A.Z.; Omar, M. Exploring llms for malware detection: Review, framework design, and countermea sure approaches. arXiv 2024, arXiv:2409.07587.
  • 26.
    Xu, H.; Wang, S.; Li, N.; et al. Large language models for cyber security: A systematic literature review. arXiv 2024, arXiv:2405.04760.
  • 27.
    Hassanin, M.; Moustafa, N. A comprehensive overview of large language models (llms) for cyber defences: Opportunities and directions. arXiv 2024, arXiv:2405.14487.
  • 28.
    Sheng, Z.; Chen, Z.; Gu, S.; et al. Llms in software security: A survey of vulnerability detection techniques and insights. arXiv 2025, arXiv:2502.07049.
  • 29.
    Basic, E.; Giaretta, A. From vulnerabilities to remediation: A systematic literature review of llms in code security. arXiv 2025, arXiv:2412.15004.
  • 30.
    Nielson, F.; Nielson, H.R.; Hankin, C. Principles of Program Analysis; Springer: Berlin/Heidelberg, Germany, 2015.
  • 31.
    Ashish, A.K.; Aghav, J. Automated techniques and tools for program analysis: Survey. In Proceedings of the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 4–6 July 2013; pp. 1–7.
  • 32.
    L¨osch, F. Instrumentation of Java Program Code for Control Flow Analysis. Ph.D. Thesis, Universit¨atsbibliothek der Universit¨ at Stuttgart, Stuttgart, Germany, 2005.
  • 33.
    Vaswani, A. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017.
  • 34.
    Douglas, M.R. Large language models. Commun. ACM 2023, 66, 7.
  • 35.
    Touvron, H.; Lavril, T.; Izacard, G.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971.
  • 36.
    AI, M.Codellama: OpenCode-Focused Language Models. 2023. Available online: https://ai.meta.com/research/code-llama (accessed on 12 October 2024).
  • 37.
    Brown, T.; Mann, B.; Ryder, N.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
  • 38.
    OpenAI. Gpt-4 Technical Report. 2023. Available online: https://openai.com/research/gpt-4 (accessed on 14 October 2024).
  • 39.
    Li, H.; Hao, Y.; Zhai, Y.; et al. The hitchhiker’s guide to program analysis: A journey with large language models. arXiv 2023, arXiv:2308.00245.
  • 40.
    Ye, J.; Fei, X.; de Carnavalet, X.D.C.; et al. Detecting command injection vulnerabilities in linux-based embedded firmware with llm-based taint analysis of library functions. Comput. Secur. 2024, 144, 103971.
  • 41.
    Liu, P.; Sun, C.; Zheng, Y.; et al. Harnessing the power of llm to support binary taint analysis. arXiv 2023, arXiv:2310.08275.
  • 42.
    Liu, D.; Lu, Z.; Ji, S.; et al. Detecting kernel memory bugs through inconsistent memory management intention inferences. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, August 14–16 2024; pp. 4069–4086.
  • 43.
    Wang, J.; Huang, Z.; Liu, H.; et al. Defecthunter: A novel llm-driven boosted-conformer-based code vulnerability detection mechanism. arXiv 2023, arXiv:2309.15324.
  • 44.
    Li, Z.; Dutta, S.; Naik, M. Llm-assisted static analysis for detecting security vulnerabilities. arXiv 2024, arXiv:2405.17238.
  • 45.
    Cheng, Y.; Shar, L.K.; Zhang, T.; et al. Llm-enhanced static analysis for precise identification of vulnerable oss versions. arXiv 2024, arXiv:2408.07321.
  • 46.
    Mao, Z.; Li, J.; Jin, D.; et al. Multi-role consensus through llms discussions for vulnerability detection. arXiv 2024, arXiv:2403.14274.
  • 47.
    Yang, A.Z.; Tian, H.; Ye, H.; et al. Security vulnerability detection with multitask self-instructed fine-tuning of large language models. arXiv 2024, arXiv:2406.05892.
  • 48.
    Sun, Y.; Wu, D.; Xue, Y.; et al. Gptscan: Detecting logic vulnerabilities in smart contracts by combining gpt with program analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. http://dx.doi.org/10.1145/3597503.3639117.
  • 49.
    Yang, Y. Iot software vulnerability detection techniques through large language model. In Proceedings of the Formal Methods and Software Engineering: 24th International Conference on Formal Engineering Methods, ICFEM 2023, Brisbane, QLD, Australia, 21–24 November 2023.
  • 50.
    Mathews, N.S.; Brus, Y.; Aafer, Y.; et al. Llbezpeky: Leveraging large language models for vulnerability detection. arXiv 2024, arXiv:2401.01269.
  • 51.
    Mohajer, M.M.; Aleithan, R.; Harzevili, N.S.; et al. Skipanalyzer: A tool for static code analysis with large language models. arXiv 2023, arXiv:2310.18532.
  • 52.
    Yang, S.; Lin, X.; Chen, J.; et al. Hyperion: Unveiling dapp inconsistencies using llm and dataflow-guided symbolic execution. arXiv 2024, arXiv:2408.06037.
  • 53.
    Zhang, C.; Liu, H.; Zeng, J.; et al. Prompt-enhanced software vulnerability detection using chatgpt. arXiv 2024, arXiv:2308.12697.
  • 54.
    Hu, S.; Huang, T.; Ilhan, F.; et al. Large language model-powered smart contract vulnerability detection: New perspectives. arXiv 2023, arXiv:2310.01152.
  • 55.
    Xiang, J.; Fu, L.; Ye, T.; et al. Luataint: A static analysis system for web configuration interface vulnerability of internet of things devices. arXiv 2024, arXiv:2402.16043.
  • 56.
    Aslan, O.A.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 6249–6271.
  • 57.
    Fujii, S.; Yamagishi, R. Feasibility study for supporting static malware analysis using llm. arXiv 2024, arXiv:2411.14905.
  • 58.
    Post, M. A call for clarity in reporting bleu scores. arXiv 2018, arXiv:1804.08771.
  • 59.
    Ganesan, K. Rouge 2.0: Updated and improved measures for evaluation of summarization tasks. arXiv 2018, arXiv:1803.01937.
  • 60.
    Simion, C.A.; Balan, G.; Gavrilut¸, D.T. Benchmarking out of the box open-source llms for malware detection based on api calls sequences. In Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning—IDEAL 2024, Valencia, Spain, 20–22 November 2024.
  • 61.
    Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825.
  • 62.
    Jiang, A.Q.; Sablayrolles, A.; Roux, A.; et al. Mixtral of experts. arXiv 2024, arXiv:2401.04088.
  • 63.
    Zahan, N.; Burckhardt, P.; Lysenko, M.; et al. Leveraging Large Language Models to Detect npm Malicious Packages. arXiv 2024, arXiv:2403.12196.
  • 64.
    GitHub. Codeql: Github’s Static Analysis Engine for Code Vulnerabilities. 2025. Available Online: https://codeql.github.com/ (accessed on 15 January 2025).
  • 65.
    Khan, I.; Kwon, Y.W. A structural-semantic approach integrating graph-based and large language models representation to detect android malware. In Proceedings of the IFIP International Conference on ICT Systems Security and Privacy Protection, Edinburgh, UK, 12–14 June 2024; pp. 279–293.
  • 66.
    Feng, Z.; Guo, D.; Tang, D.; et al. Codebert: A pre-trained model for programming and natural languages. arXiv 2020, arXiv:2002.08155.
  • 67.
    Zhao, W.; Wu, J.; Meng, Z. Apppoet: Large language model based android malware detection via multi-view prompt engineering. arXiv 2024, arXiv:2404.18816.
  • 68.
    Kozyrev, A.; Solovev, G.; Khramov, N.; et al. Coqpilot, a plugin for llm-based generation of proofs. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA, 27 October–1 November 2024; pp. 2382–2385.
  • 69.
    Zhang, L.; Lu, S.; Duan, N. Selene: Pioneering automated proof in software verification. arXiv 2024, arXiv:2401.07663.
  • 70.
    Chakraborty, S.; Lahiri, S.K.; Fakhoury, S.; et al. Ranking llm-generated loop invariants for program verification. arXiv 2024, arXiv:2310.09342.
  • 71.
    Janßen, C.; Richter, C.; Wehrheim, H. Can chatgpt support software verification? arXiv 2023, arXiv:2311.02433.
  • 72.
    Pirzada, M.A.; Reger, G.; Bhayat, A.; et al. Llm-generated invariants for bounded model checking without loop unrolling. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA, 27 October–1 November 2024; pp. 1395–1407.
  • 73.
    Wu, G.; Cao, W.; Yao, Y.; et al. Llm meets bounded model checking: Neuro-symbolic loop invariant inference. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA, 27 October–1 November 2024; pp. 406–417.
  • 74.
    Pei, K.; Bieber, D.; Shi, K.; et al. Can large language models reason about program invariants? In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023.
  • 75.
    Wen, C.; Cao, J.; Su, J.; et al. Enchanting program specification synthesis by large language models using static analysis and program verification. In Proceedings of the Computer Aided Verification: 36th International Conference, CAV 2024, Montreal, QC, Canada, 24–27 July 2024.
  • 76.
    Wu, H.; Barrett, C.; Narodytska, N. Lemur: Integrating large language models in automated program verification. arXiv 2024, arXiv:2310.04870.
  • 77.
    Mukherjee, P.; Delaware, B. Towards automated verification of llm-synthesized c programs. arXiv 2024, arXiv:2410.14835.
  • 78.
    Liu, Y.; Xue, Y.; Wu, D.; et al. Propertygpt: Llm-driven formal verification of smart contracts through retrieval-augmented property generation. arXiv 2024, arXiv:2405.02580.
  • 79.
    Wang, W.; Liu, K.; Chen, A.R.; et al. Python symbolic execution with llm-powered code generation. arXiv 2024, arXiv:2409.09271.
  • 80.
    Su, J.; Deng, L.; Wen, C.; et al. Cfstra: Enhancing configurable program analysis through llm-driven strategy selection based on code features. In Proceedings of the Theoretical Aspects of Software Engineering: 18th International Symposium, TASE 2024, Guiyang, China, 29 July–1 August 2024; pp. 374–391. https://doi.org/10.1007/978-3-031-64626-3 22.
  • 81.
    Chapman, P.J.; Rubio-Gonz´alez, C.; Thakur, A.V. Interleaving static analysis and llm prompting. In Proceedings of the 13th ACM SIGPLANInternational Workshop on the State Of the Art in Program Analysis, Copenhagen, Denmark, 25 June 2024; pp. 9–17. https://doi.org/10.1145/3652588.3663317.
  • 82.
    Anthropic. Claude. 2025. Available online: https://www.anthropic.com/claude (accessed on 16 January 2025).
  • 83.
    Czajka, Ł.; Kaliszyk, C. Hammer for coq: Automation for dependent type theory. J. Autom. Reason. 2018, 61, 423–453. https://doi.org/10.1007/s10817-018-9458-4.
  • 84.
    Blaauwbroek, L.; Urban, J.; Geuvers, H. The Tactician: A Seamless, Interactive Tactic Learner and Prover for Coq; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 271–277. http://dx.doi.org/10.1007/978-3-03053518-6_17.
  • 85.
    Klein, G.; Andronick, J.; Elphinstone, K.; et al. Comprehensive formal verification of an os microkernel. ACM Trans. Comput. Syst. 2014, 32, 1–70. https://doi.org/10.1145/2560537.
  • 86.
    Group, T.S. sel4: The World’s First Operating-System Kernel with an End-to-End Proof of Implementation Correctness. Available online: https://sel4.systems/ (accessed on 18 January 2025).
  • 87.
    Beyer, D. Competition on Software Verification and Witness Validation: SV-COMP 2023; Springer Nature: Cham, Switzerland, 2023; pp. 495–522.
  • 88.
    Baudin, P.; Filliˆatre, J.-C.; March´e, C.; et al. ACSL: ANSI/ISO C Specification Language. Available online: http://framac.com/download/acsl.pdf (accessed on 15 October 2024).
  • 89.
    Baudin, P.; Bobot, F.; B¨ uhler, D.; et al. The dogged pursuit of bug-free c programs: the frama-c software analysis platform. Commun. ACM 2021, 64, 56–68. https://doi.org/10.1145/3470569.
  • 90.
    Beyer, D.; Keremoglu, M.E. Cpachecker: A tool for configurable software verification. In Computer Aided Verification; Springer: Berlin/Heidelberg, Germany, 2011; pp. 184–190.
  • 91.
    Ernst, M.D.; Perkins, J.H.; Guo, P.J.; et al. The daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 2007, 69, 35–45.
  • 92.
    Menezes, R.; Aldughaim, M.; Farias, B.; et al. Esbmc v7.4: Harnessing the power of intervals. arXiv 2023, arXiv:2312.14746.
  • 93.
    Gurfinkel, A.; Kahsai, T.; Komuravelli, A.; et al. The seahorn verification framework. In Computer Aided Verification; Springer International Publishing: Cham, Switzerland, 2015; pp. 343–361.
  • 94.
    Darke, P.; Agrawal, S.; Venkatesh, R. Veriabs: A tool for scalable verification by abstraction (competition contribution). In Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems: 27th International Conference, TACAS2021, Luxembourg, 27 March–1 April 2021. https://doi.org/10.1007/978-3-030-72013-1_32.
  • 95.
    Kroening, D.; Tautschnig, M. Cbmc—C bounded model checker. In Tools and Algorithms for the Construction and Analysis of Systems; Springer: Berlin/Heidelberg, Germany, 2014; pp. 389–391.
  • 96.
    Heizmann, M.; Christ, J.; Dietsch, D.; et al. Ultimate automizer with smtinterpol. In Tools and Algorithms for the Construction and Analysis of Systems; Springer: Berlin/Heidelberg, Germany, 2013; pp. 641–643.
  • 97.
    De Moura, L.; Bjørner, N. Z3: an efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, Budapest, Hungary, 29 March–6 April 2008; pp. 337–340.
  • 98.
    Lu, J.; Yu, L.; Li, X.; et al. Llama-reviewer: Advancing code review automation with large language models through parameter efficient fine-tuning. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), Florence, Italy, 9–12 October 2023; pp. 647–658.
  • 99.
    Dhulipala, H.; Yadavally, A.; Nguyen, T.N. Planning to guide llm for code coverage prediction. In Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering, Lisbon, Portugal, 14 April 2024; pp. 24–34. https://doi.org/10.1145/3650105.3652292.
  • 100.
    Hu, P.; Liang, R.; Chen, K. Degpt: Optimizing decompiler output with llm. In Proceedings of the 2024 Network and Distributed System Security Symposium, San Diego, CA, USA, 26 February–1 March 2024.
  • 101.
    Yan, J.; Huang, J.; Fang, C.; et al. Better debugging: Combining static analysis and llms for explainable crashing fault localization. arXiv 2024, arXiv:2408.12070.
  • 102.
    Pomian, D.; Bellur, A.; Dilhara, M.; et al. Together we go further: Llms and ide static analysis for extract method refactoring. arXiv 2024, arXiv:2401.15298.
  • 103.
    Rong, H.; Duan, Y.; Zhang, H.; et al. Disassembling obfuscated executables with llm. arXiv 2024, arXiv:2407.08924.
  • 104.
    Wang, H.; Wang, Z.; Liu, P. A hybrid llm workflow can help identify user privilege related variables in programs of any size. arXiv 2024, arXiv:2403.15723.
  • 105.
    Wen, C.; Cai, Y.; Zhang, B.; et al. Automatically inspecting thousands of static bug warnings with large language model: Howfar are we? ACM Trans. Knowl. Discov. Data 2024, 18, 1–34. https://doi.org/10.1145/3653718.
  • 106.
    Flynn, L.; Klieber, W. Using Llms to Automate Static-Analysis Adjudication and Rationales. Available on line: https://insights.sei.cmu.edu/library/using-llms-to-automate-static-analysis-adjudication-and-rationales/ (accessed on 2 October 2024).
  • 107.
    Hao, Y.; Chen, W.; Zhou, Z.; et al. E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification. arXiv 2023, arXiv:2312.08477.
  • 108.
    Yan, P.; Tan, S.; Wang, M.; et al. Prompt engineering-assisted malware dynamic analysis using gpt-4. arXiv 2023, arXiv:2312.08317.
  • 109.
    Sun, Y.S.; Chen, Z.K.; Huang, Y.T.; et al. Unleashing Malware Analysis and Understanding With Generative AI. IEEE Secur. Priv. 2024, 22, 12–23.
  • 110.
    S´anchez, P.M.S.; Celdr´an, A.H.; Bovet, G.; et al. Transfer learning in pre-trained large language models for malware detection based on system calls. arXiv 2024, arXiv:2405.09318.
  • 111.
    Li, Y.; Fang, S.; Zhang, T.; et al. Enhancing android malware detection: The influence of chatgpt on decision-centric task. arXiv 2024, arXiv:2410.04352.
  • 112.
    Qiu, F.; Ji, P.; Hua, B.; et al. Chemfuzz: Large language models-assisted fuzzing for quantum chemistry software bug detection. In Proceedings of the 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), Chiang Mai, Thailand, 22–26 October 2023; pp. 103–112.
  • 113.
    Anthropic. Claude-2. 2023. Available online: https://www.anthropic.com/index/claude-2 (accessed on 9 December 2024).
  • 114.
    Google. Bard. 2023. Available online: https://bard.google.com (accessed on 9 December 2024).
  • 115.
    Eom, J.; Jeong, S.; Kwon, T. Covrl: Fuzzing javascript engines with coverage-guided reinforcement learning for llm-based mutation. arXiv 2024, arXiv:2402.12222.
  • 116.
    Zhang, H.; Rong, Y.; He, Y.; et al. Llamafuzz: Large language model enhanced greybox fuzzing. arXiv 2024, arXiv:2406.07714.
  • 117.
    Deng, Y.; Xia, C.S.; Yang, C.; et al. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv 2023, arXiv:2304.02014.
  • 118.
    Hu, J.; Zhang, Q.; Yin, H. Augmenting greybox fuzzing with generative ai. arXiv 2023, arXiv:2306.06782.
  • 119.
    Lemieux, C.; Inala, J.P.; Lahiri, S.K.; et al. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In Proceedings of the 45th International Conference on Software Engineering, Melbourne, Australia, 14–20 May 2023; pp. 919–931. https://doi.org/10.1109/ICSE48619.2023.00085.
  • 120.
    Meng, R.; Mirchev, M.; B¨ohme, M.; et al. Large language model guided protocol fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 26 February–1 March 2024.
  • 121.
    Oliinyk, Y.; Scott, M.; Tsang, R.; et al. Fuzzing busybox: Leveraging llm and crash reuse for embedded bug unearthing. arXiv 2024, arXiv:2403.03897.
  • 122.
    Xia, C.S.; Paltenghi, M.; Le Tian, J.; et al. Fuzz4all: Universal fuzzing with large language models. arXiv 2024, arXiv:2308.04748.
  • 123.
    ICMAB-CSIC. Siesta. 2023. Available online: https://departments.icmab.es/leem/siesta/ (accessed on 9 December 2024).
  • 124.
    Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21.
  • 125.
    Chen, M.; Tworek, J.; Jun, H.; et al. Evaluating large language models trained on code. arXiv 2021, arXiv:2107.03374.
  • 126.
    Nijkamp, E.; Pang, B.; Hayashi, H.; et al. Codegen: An open large language model for code with multi-turn program synthesis. arXiv 2023, arXiv:2203.13474.
  • 127.
    Fioraldi, A.; Maier, D.; Eißfeldt, H.; et al. {AFL++}: Combining Incremental Steps of Fuzzing Research. In Proceedings of the 14th USENIX Workshop on Offensive Technologies (WOOT 20), Online, 11 August 2020. Available online: https://www.usenix.org/conference/woot20/presentation/fioraldi (accessed on 11 November 2024).
  • 128.
    McMinn, P. Search-based software test data generation: a survey: Research articles. Softw. Test. Verif. Reliab. 2004, 14, 105–156.
  • 129.
    Wells, N. Busybox: A swiss army knife for linux. Linux J. 2000, 78, 10.
  • 130.
    Arkin, B.; Stender, S.; McGraw, G. Software penetration testing. IEEE Secur. Priv. 2005, 3, 84–87.
  • 131.
    Deng, G.; Liu, Y.; Mayoral-Vilches, V.; et al. {PentestGPT}: Evaluating and harnessing large language models for automated penetration testing. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 847–864.
  • 132.
    Huang, J.; Zhu, Q. Penheal: A Two-Stage Llm Framework for Automated Pentesting and Optimal Remediation. Available online: https://synthical.com/article/655e0b6b-8ece-4830-bb82-649bac33bd5e (accessed on 18 June 2024).
  • 133.
    Goyal, D.; Subramanian, S.; Peela, A. Hacking, the lazy way: Llm augmented pentesting. arXiv 2024, arXiv:2409.09493.
  • 134.
    Bianou, S.G.; Batogna, R.G. Pentest-ai, an llm-powered multi-agents framework for penetration testing automation leveraging mitre attack. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; pp. 763–770.
  • 135.
    Zych, M.; Mavroeidis, V. Enhancing the stix representation of mitre attack for group filtering and technique prioritization. arXiv 2022, arXiv:2204.11368.
  • 136.
    Muzsai, L.; Imolai, D.; Luk´acs, A. Hacksynth: Llm agent and evaluation framework for autonomous penetration testing. arXiv 2024, arXiv:2412.01778.
  • 137.
    LGioacchini, L.; Mellia, M.; Drago, I.; et al. Autopenbench: Benchmarking generative agents for penetration testing. arXiv 2024, arXiv:2410.03225.
  • 138.
    Shen, X.; Wang, L.; Li, Z.; et al. Pentestagent: Incorporating llm agents to automated penetration testing. arXiv 2024, arXiv:2411.05185.
  • 139.
    Bhatia, S.; Gandhi, T.; Kumar, D.; et al. Unit test generation using generative ai: A comparative performance analysis of autogeneration tools. In Proceedings of the 1st International Workshop on Large Language Models for Code, Lisbon, Portugal, 20 April 2024; pp. 54–61. https://doi.org/10.1145/3643795.3648396.
  • 140.
    Lukasczyk, S.; Fraser, G. Pynguin: automated unit test generation for python. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, Pittsburgh, PA, USA, 21–29 May 2022. http://dx.doi.org/10.1145/3510454.3516829.
  • 141.
    Pan, R.; Kim, M.; Krishna, R.; et al. Multi-language unit test generation using llms. arXiv 2024, arXiv:2409.03093.
  • 142.
    Chen, Y.; Hu, Z.; Zhi, C.; et al. Chatunitest: A framework for llm-based test generation. In Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, 15–19 July 2024. https://doi.org/10.1145/3663529.3663801.
  • 143.
    Zhang, Z.; Liu, X.; Lin, Y.; et al. Llm-based unit test generation via property retrieval. arXiv 2024, arXiv:2410.13542.
  • 144.
    Gu, S.; Fang, C.; Zhang, Q.; et al. Testart: Improving llm-based unit testing via co-evolution of automated generation and repair iteration. arXiv 2024, arXiv:2408.03095.
  • 145.
    Yuan, Z.; Lou, Y.; Liu, M.; et al. No more manual tests? evaluating and improving chatgpt for unit test generation. arXiv 2024, arXiv:2305.04207.
  • 146.
    Wang, Z.; Liu, K.; Li, G.; et al. Hits: High-coverage llm-based unit test generation via method slicing. arXiv 2024, arXiv:2408.11324.
  • 147.
    Lops, A.; Narducci, F.; Ragone, A.; et al. A system for automated unit test generation using large language models and assessment of generated test suites. arXiv 2024, arXiv:2408.07846.
  • 148.
    Nunez, A.; Islam, N.T.; Jha, S.K.; et al. Autosafecoder: A multi-agent framework for securing llm code generation through static analysis and fuzz testing. arXiv 2024, arXiv:2409.10737.
  • 149.
    Pizzorno, J.A.; Berger, E.D. Coverup: Coverage-guided llm-based test generation. arXiv 2024, arXiv:2403.16218.
  • 150.
    Kumar, R.; Xiaosong, Z.; Khan, R.U.; et al. Effective and explainable detection of android malware based on machine learning algorithms. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence, Chengdu China, 12–14 March 2018. https://doi.org/10.1145/3194452.3194465.
  • 151.
    Onwuzurike, L.; Mariconti, E.; Andriotis, P.; et al. Mamadroid: Detecting android malware by building markov chains of behavioral models (extended version). arXiv 2019, arXiv:1711.07477.
  • 152.
    Wu, B.; Chen, S.; Gao, C.; et al. Why an android app is classified as malware? towards malware classification interpretation. arXiv 2020, arXiv:2004.11516.
  • 153.
    Williamson, A.Q.; Beauparlant, M. Malware reverse engineering with large language model for superior code comprehen sibility and ioc recommendations. Research Square 2024, https://doi.org/10.21203/rs.3.rs-4471373/v1.
  • 154.
    Glazunov, S.; Brand, M. Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models. Available online: https://googleprojectzero.blogspot.com/2024/06/project-naptime.html (accessed on 16 October 2024).
  • 155.
    Bhatt, M.; Chennabasappa, S.; Li, Y.; et al. Cyberseceval 2: A wide-ranging cybersecurity evaluation suite for large language models. arXiv 2024, arXiv:2404.13161.
  • 156.
    Liu, A.; Feng, B.; Xue, B.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437.
Share this article:
How to Cite
Wang, J.; Ni, T.; Lee, W.-B.; Zhao, Q. A Contemporary Survey of Large Language Model Assisted Program Analysis. Transactions on Artificial Intelligence 2025, 1 (1), 105–129. https://doi.org/10.53941/tai.2025.100006.
RIS
BibTex
Copyright & License
article copyright Image
Copyright (c) 2025 by the authors.