MITIGATING HALLUCINATIONS IN LARGE LANGUAGE MODELS: A COMPARATIVE STUDY OF RETRIEVAL-AUGMENTED GENERATION (RAG) TECHNIQUES

Authors

  • Prasad Maderamitla Author

DOI:

https://doi.org/10.46121/pspc.54.2.35

Keywords:

Hallucination, Retrieval-Augmented Generation, Large Language Models, Dense Retrieval, Hybrid RAG, Faithfulness, NLP Evaluation

Abstract

Hallucination — the generation of factually incorrect, fabricated, or contextually inconsistent content — remains one of the most significant challenges in deploying large language models (LLMs) in production systems. This paper presents a systematic comparative study of Retrieval-Augmented Generation (RAG) techniques as a mitigation strategy. We evaluate five configurations: a parametric baseline (no retrieval), Naive RAG, Dense Retrieval RAG, Hybrid RAG, and Advanced RAG with cross-encoder reranking and query expansion. Experiments conducted on TriviaQA, Natural Questions (NQ), HotpotQA, and FEVER benchmarks demonstrate that Advanced RAG reduces hallucination rates from 33.9% (baseline) to as low as 3.5%, achieving a ROUGE-L of 0.75 and faithfulness score of 0.91. Ablation studies identify context filtering and reranking as the most impactful components. Our findings provide actionable guidelines for practitioners seeking to deploy reliable, fact-grounded language generation systems.

Downloads

Published

2026-05-28