Prompt Injection Detection in LLM Integrated Applications
Author Information
Abstract
The integration of large language models (LLMs) into creative applications has unlocked new capabilities but also introduced vulnerabilities, notably prompt injections. These are malicious inputs designed to manipulate model responses, posing threats to security, privacy, and functionality. This paper delves into the mechanisms of prompt injections, their impacts, and presents novel detection strategies. More specifically, the necessity for robust detection systems is outlined, a predefined list of banned terms is combined to embed techniques for similarity search, and a BERT (Bidirectional Encoder Representa- tions from Transformers) model is built to identify and mitigate prompt injections effectively with the aim to neutralize prompt injections in real-time. The research highlights the challenges in balancing secu- rity with usability, evolving attack vectors, and LLM limitationsm, and emphasizes the significance of securing LLM-integrated applications against prompt injections to preserve data privacy, user trust, and uphold ethical standards. This work aims to foster collaboration for developing standardized security frameworks, contributing to more safer and reliable AI-driven systems.
References

This work is licensed under a Creative Commons Attribution 4.0 International License.