Project
Adversarial Prompt Detection in LLMs
Built a real-time adversarial prompt detection pipeline using fine-tuned transformer models for identifying jailbreak and prompt-injection style attacks. The project demonstrates practical AI safety engineering with measurable performance and deployment potential.
Built a real-time adversarial prompt filter for GPT-4 using fine-tuned RoBERTa and DistilBERT models.
Evaluated the system on roughly 100K prompts and achieved strong accuracy and recall in malicious prompt detection.
Used AWS SageMaker for training and fine-tuning to harden the pipeline against evolving jailbreak methods.
LLMsSecurityPyTorch