Project

Adversarial Prompt Detection in LLMs

Built a real-time adversarial prompt detection pipeline using fine-tuned transformer models for identifying jailbreak and prompt-injection style attacks. The project demonstrates practical AI safety engineering with measurable performance and deployment potential.

GitHub Repo

Built a real-time adversarial prompt filter for GPT-4 using fine-tuned RoBERTa and DistilBERT models.

Evaluated the system on roughly 100K prompts and achieved strong accuracy and recall in malicious prompt detection.

Used AWS SageMaker for training and fine-tuning to harden the pipeline against evolving jailbreak methods.

LLMsSecurityPyTorch