Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
This paper introduces BitRAG, a lightweight and privacy-focused Retrieval-Augmented Generation (RAG) system built for answering questions from documents on devices with limited resources. With digital files piling up everywhere, we need intelligent tools that can pull the right answers fast especially when sifting through massive libraries. Old-school keyword searches just don’t cut it since they miss context, relationships, and the deeper meaning in the text. BitRAG gets around these issues by running compact language models (between 1 and 3 billion parameters) fully on local hardware, thanks to the Ollama framework. No cloud API dependency means user data stays private, and response quality remains solid. The system taps ChromaDB for vector storage and uses Hierarchical Navigable Small World (HNSW) indexes for fast searches getting results in under 10 milliseconds even when dealing with lots of documents. For embeddings, we use sentence-transformers (all-MiniLM-L6-v2) to create dense 384-dimensional semantic vectors. With about 22 million parameters, it’s optimized to run quickly just on CPUs processing up to 150 documents a second. This setup keeps data private and organized, perfect for environments where users need their information walled off from others. Its search combines vector similarity and BM25 keyword methods through Reciprocal Rank Fusion (RRF), so it gets the best results
from both semantic and keyword retrieval. The experimental numbers stand out: in “needle-in-haystack” tests with 20 target documents hiding among many others, it finds the right answer every time. It scores 82.5% in answer quality, delivers perfect faithfulness and precision every time, and reduces hallucination risk by 70% on out-of-scope questions. On standard CPUs, the average response time sits at 7.4 seconds, with throughput at 28.7 tokens per second showing small models can handle document question-answering without heavy-duty hardware. It works smoothly on regular machines with just 4GB of RAM, making advanced document analysis available to just about anyone.
Keywords:
Retrieval-Augmented Generation, Local Inference, Small Language Models, Ollama, Vector Search, Privacy-Preserving, ChromaDB, HNSW Indexing, Hybrid Search, BM25
Cite Article:
"BitRAG: A Lightweight Local Retrieval-Augmented Generation System for Privacy-Preserving Document Question-Answering", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2456-3315, Vol.11, Issue 4, page no.b730-b737, April-2026, Available :http://www.ijrti.org/papers/IJRTI2604238.pdf
Downloads:
000205509
ISSN:
2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator