Prompt Injection Detection in LLM Integrated Applications

Qianlong Lan; AnujKaul; Shaun Jones

doi:10.53941/ijndi.2025.100013

Abstract

The integration of large language models (LLMs) into creative applications has unlocked new capabilities but also introduced vulnerabilities, notably prompt injections. These are malicious inputs designed to manipulate model responses, posing threats to security, privacy, and functionality. This paper delves into the mechanisms of prompt injections, their impacts, and presents novel detection strategies. More specifically, the necessity for robust detection systems is outlined, a predefined list of banned terms is combined to embed techniques for similarity search, and a BERT (Bidirectional Encoder Representa- tions from Transformers) model is built to identify and mitigate prompt injections effectively with the aim to neutralize prompt injections in real-time. The research highlights the challenges in balancing secu- rity with usability, evolving attack vectors, and LLM limitationsm, and emphasizes the significance of securing LLM-integrated applications against prompt injections to preserve data privacy, user trust, and uphold ethical standards. This work aims to foster collaboration for developing standardized security frameworks, contributing to more safer and reliable AI-driven systems.

References

1.
Wei, J.; Tay, Y.; Bommasani, R.; et al. Emergent abilities of large language models. arXiv, 2022, arXiv: 2206.07682.
2.
Pandya, K.; Holia, M. Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations. arXiv, 2023, arXiv: 2310.05421.
3.
Yuan, A.; Coenen, A.; Reif, E.; et al. Wordcraft: Story writing with large language models. In Proceedings of the 27th International Conference on Intelligent User Interfaces, Helsinki, Finland, 22–25 March 2022; ACM: New York, NY, USA, 2022; pp. 841–852. doi: 10.1145/3490099.3511105
4.
Lopez-Lira, A.; Tang, Y.H. Can chatGPT forecast stock price movements? Return predictability and large language models. arXiv, 2023, arXiv: 2304.07619.
5.
Shayegani, E.; Mamun, A.A.; Fu, Y.; et al. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv, 2023, arXiv: 2310.10844.
6.
Kour, G.; Zalmanovici, M.; Zwerdling, N.; et al. Unveiling safety vulnerabilities of large language models. In Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics, Singapore, 6 December 2023; ACL: Stroudsburg, PA, USA, 2023; pp. 111–127.
7.
Shi, J.W.; Liu, Y.X.; Zhou, P.; et al. BadGPT: Exploring security vulnerabilities of ChatGPT via backdoor attacks to InstructGPT. arXiv, 2023, arXiv: 2304.12298.
8.
Zhao, S.; Jia, M.H.Z.; Tuan, L.A.; et al. Universal vulnerabilities in large language models: Backdoor attacks for in-context learning. arXiv 2024, arXiv: 2401.05949.
9.
Liu, Y.; Deng, G.L.; Li, Y.K.; et al. Prompt injection attack against LLM-integrated applications. arXiv, 2023, arXiv: 2306.05499.
10.
Pedro, R.; Castro, D.; Carreira, P.; et al. From prompt injections to SQL injection attacks: How protected is your LLM-integrated web application? arXiv, 2023, arXiv: 2308.01990.
11.
Yu, J.H.; Wu, Y.H.; Shu, D.; et al. Assessing prompt injection risks in 200+ custom GPTs. arXiv, 2023, arXiv: 2311.11538.
12.
Wu, F.Z.; Zhang, N.; Jha, S.; et al. A new era in LLM security: Exploring security concerns in real-world LLM-based systems. arXiv, 2024, arXiv: 2402.18649.
13.
Greshake, K.; Abdelnabi, S.; Mishra, S.; et al. Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, Copenhagen, Denmark, 30 November 2023; ACM: New York, NY, USA, 2023; pp. 79–90. doi: 10.1145/3605764.3623985
14.
Camacho-Collados, J.; Pilehvar, M.T. From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res., 2018, 63: 743−788. doi: 10.1613/jair.1.11259
15.
Devlin, J.; Chang, M.W.; Lee, K.; et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; ACL: Stroudsburg, PA, USA, 2019; pp. 4171–4186. doi: 10.18653/v1/N19-1423
16.
OpenAI; Achiam, J.; Adler, S.; et al. GPT-4 technical report. arXiv, 2023, arXiv: 2303.08774.
17.
Anil, R.; Dai, A.M.; Firat, O.; et al. PaLM 2 technical report. arXiv, 2023, arXiv: 2305.10403.
18.
Touvron, H.; Lavril, T.; Izacard, G.; et al. LLaMA: Open and efficient foundation language models. arXiv, 2023, arXiv: 2302.13971.
19.
Wen, H.; Li, Y.C.; Liu, G.H.; et al. Empowering LLM to use smartphone for intelligent task automation. arXiv, 2023, arXiv: 2308.15272.
20.
Alto, V. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the Capabilities of OpenAI's LLM for Productivity and Innovation with GPT3 and GPT4; Packt Publishing: Birmingham, UK, 2023.
21.
Yu, S.C.; Fang, C.R.; Ling, Y.C.; et al. LLM for test script generation and migration: Challenges, capabilities, and opportunities. In Proceedings of 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security, Chiang Mai, Thailand, 22–26 October 2023; IEEE: New York, NY, USA, 2023; pp. 206–217.
22.
Cui, C.; Ma, Y.S.; Cao, X.; et al. Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles. In Proceedings of 2024 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 1–6 January 2024; IEEE: New York, 2024; pp. 902–909.
23.
Das, A.; Tariq, A.; Batalini, F.; et al. Exposing vulnerabilities in clinical LLMs through data poisoning attacks: Case study in breast cancer. medRxiv 2024, in press. doi: 10.1101/2024.03.20.24304627
24.
Zou, A.; Wang, Z.F.; Carlini, N.; et al. Universal and transferable adversarial attacks on aligned language models. arXiv, 2023, arXiv: 2307.15043.
25.
Wei, A.; Haghtalab, N.; Steinhardt, J. Jailbroken: How does LLM safety training fail? In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023.
26.
Wang, J.G.; Wang, J.; Li, M.; et al. Pandora’s white-box: Increased training data leakage in Open LLMs. arXiv, 2024, arXiv: 2402.17012.
27.
Cai, X.R.; Xu, H.D.; Xu, S.H.; et al. BadPrompt: Backdoor attacks on continuous prompts. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022.
28.
Carlini, N.; Tramèr, F.; Wallace, E.; et al. Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium, 11–13 August 2021; USENIX Association: Berkeley, CA, USA, 2021; pp. 2633–2650.
29.
Liu, T.; Deng, Z.Z.; Meng, G.Z.; et al. Demystifying RCE vulnerabilities in LLM-integrated apps. arXiv, 2023, arXiv: 2309.02926.
30.
Zhong, L.; Wang, Z.L. A study on robustness and reliability of large language model code generation. arXiv, 2023, arXiv: 2308.10335.
31.
He, J.X.; Vechev, M. Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; ACM: New York, NY, USA, 2023; pp. 1865–1879. doi: 10.1145/3576915.3623175
32.
Greshake, K.; Abdelnabi, S.; Mishra, S.; et al. More than you've asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv, 2023, arXiv: 2302.12173.
33.
Liu, X.G.; Yu, Z.Y.; Zhang, Y.Z.; et al. Automatic and universal prompt injection attacks against large language models. arXiv, 2024, arXiv: 2403.04957.
34.
Hadi, M.U.; Al-Tashi, Q.; Qureshi, R.; et al. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Prepr. 2023. doi: 10.36227/techrxiv.23589741.v1
35.
Pilehvar, M.T.; Camacho-Collados, J. Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning; Morgan and Claypool Publishers: San Rafael, CA, USA, 2020.
36.
Xian, J.; Teofili, T.; Pradeep, R.; et al. Vector search with OpenAI embeddings: Lucene is all you need. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Merida Mexico, 4–8 March 2024; ACM: New York, NY, USA, 2024; pp. 1090–1093. doi: 10.1145/3616855.363569
37.
The Vector Database to Build Knowledgeable AI. Available online: https://www.pinecone.io/lp/vector-database/?utm_term=vector+search (accessed on 28 March 2024).
38.
Malkov, Y.; Ponomarenko, A.; Logvinov, A.; et al. Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst., 2014, 45: 61−68. doi: 10.1016/j.is.2013.10.006
39.
Liu, Y.H.; Ott, M.; Goyal, N.; et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv, 2019, arXiv: 1907.11692.
40.
Sanh, V.; Debut, L.; Chaumond, J.; et al. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv, 2019, arXiv: 1910.01108.
41.
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; ACL: Stroudsburg, PA, USA, 2019; pp. 3980–3990. doi: 10.18653/v1/D19-1410
42.
Labonne, M.; Moran, S. Spam-T5: Benchmarking large language models for few-shot email spam detection. arXiv, 2023, arXiv: 2304.01238.
43.
Wan, Z.W. Text classification: A perspective of deep learning methods. arXiv, 2023, arXiv: 2309.13761.
44.
Lakera AI. 2023. Available online: https://www.lakera.ai (accessed on 28 March 2024).
45.
Schwenzow, J. Prompt-Injections. Huggingface.co. 2023. Available online: https://huggingface.co/datasets/JasperLS/prompt-injections (accessed on 27 December 2023).
46.
Jaramillo, R.D. ChatGPT-Jailbreak-Prompts. Huggingface.co. 2022. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts (accessed on 27 December 2023).
47.
Çimen, B. Turkish-Prompt-Injections. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/beratcmn/turkish-prompt-injections (accessed on 28 March 2024).
48.
Hamid, M.R. Prompt_Injection_Cleaned_Dataset. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/imoxto/prompt_injection_cleaned_dataset (accessed on 28 March 2024).
49.
Yugen.ai. Prompt-Injection-Mixed-Techniques-2024. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/Harelix/Prompt-Injection-Mixed-Techniques-2024 (accessed on 28 May 2024).
50.
Hackaprompt. Hackaprompt-Dataset. Huggingface.co. 2023. Available online: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset (accessed on 28 March 2024).
51.
Hamid, M.R. Prompt_Injection_Cleaned_Dataset-v2. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/imoxto/prompt_injection_cleaned_dataset-v2 (accessed on 22 April 2024).
52.
Anonymous. compass-ctf-team. 2023. Available online: https://github.com/compass-ctf-team/prompt_injection_research/blob/main/dataset (accessed on 28 March 2024).
53.
Anonymous. Fka/Awesome-Chatgpt-Prompts. Huggingface.co. Available online: https://huggingface.co/datasets/fka/awesome-chatgpt-prompts (accessed on 28 March 2024).
54.
HuggingFaceH4. ultrachat_200k. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k (accessed on 28 March 2024).
55.
Goosen. Prompt_Injection_Password_or_Secret. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/cgoosen/prompt_injection_password_or_secret (accessed on 28 March 2024).
56.
Lan, Q.L. Lanqianlong/IJNDI. 2024. Available online: https://github.com/lanqianlong/IJNDI (accessed on 28 March 2024).
57.
Lakera. Mosscap_Prompt_Injection. Huggingface.co. 2024. Available online: https://huggingface.co/datasets/Lakera/mosscap_prompt_injection (accessed on 22 January 2024).
58.
sklearn.metrics. Precision_Recall_Fscore_Support. 2019. Scikit-Learn.org. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html (accessed on 28 March 2024).

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us