RAG & Knowledge Cutoff

From knowledge-free language models to knowledge-grounded AI: How do we teach AI facts it won't forget?

Solving AI Hallucination · Vector Databases · Factual Truth

1. Theory: The Base Language Model

A base Language Model is fundamentally a Next-Word Predictor. It is trained on massive datasets to understand linguistic syntax, grammar, and statistical relationships between words.

It understands how to speak, but it does not possess a factual database. It relies entirely on its neural weights to "remember" facts, treating factual retrieval exactly the same as grammar completion.

It maps human words into numerical indices (Vocabulary).
It uses Self-Attention to weigh the importance of context words.

2. The Hallucination Danger

The Problem

When you ask a base model a highly specific factual question (e.g., "What is Thirukkural 1250?"), it does not search a database. It simply runs the words through its attention layers and outputs the word with the highest statistical probability.

This results in Hallucination. The AI will invent a fake, 2-line poem that sounds like ancient Tamil, and present it with 100% confidence. It is a probability engine, not a truth engine.

# The Prediction (Guessing the Truth)
with torch.no_grad():
    predictions = model(tensor_input)
    
    # It blindly selects the word with the 
    # highest probability
    pred_idx = predictions.argmax().item()
    predicted_word = idx2word[pred_idx]
                

3. Retrieval-Augmented Generation

To fix hallucination, we must separate Language Generation from Knowledge Storage. We stop expecting the LLM to memorize facts, and instead give it an "Open Book Exam".

Retrieve

We store factual data in a secure Vector Database. When a user asks a question, we mathematically search this database for the exact truth.

Augment

We paste that exact truth directly into the AI's prompt.

Generate

We force the LLM to act as a brilliant reasoning processor over the provided facts, ignoring its own faulty memory.

4. RAG Implementation: Retrieval

Instead of guessing, we use Gemini Embeddings to convert the user's problem into a mathematical vector, and find the closest Thirukkural in our database.

# Step 1: Embed the user's problem into a vector
res = client.models.embed_content(
    model="text-embedding-004", 
    contents=user_input
)
query_embedding = res.embeddings[0].values

# Step 2: Search database using Cosine Similarity
similarities = [
    cosine_similarity(query_embedding, emb) 
    for emb in corpus_embeddings
]

# Step 3: Extract the exact, factual truth
best_idx = np.argmax(similarities)
best_kural = df.iloc[best_idx]
                

5. RAG Implementation: Augment & Generate

We restrict the LLM. We inject the truth into the prompt, forcing the model to generate its response based strictly on the retrieved Kural.

# Step 4: AUGMENT - Inject into prompt
kural_text = f"Kural: {best_kural['Line1']}\n{best_kural['Line2']}"

prompt = f"""
User's problem: {user_input}
Retrieved Kural: {kural_text}
"""

# Step 5: GENERATE - AI responds
response = client.models.generate_content(
    model="gemini", 
    contents=prompt
)
                

6. Before vs After Comparison

Base LLM

Question: "What is Thirukkural 433?"

"Confidence without fear leads to success
Therefore, one must always be brave"

Hallucination! This is not in Thirukkural!

RAG System

Question: "What is Thirukkural 433?"

"To fear when fear is right is wisdom's sign
But fear of fear itself shows foolish mind"

100% Accurate! Retrieved from database

7. Benefits of RAG

Zero Hallucinations: AI is constrained to the data we provide. It cannot invent fake Kurals.
Instant Updates: Teaching new facts doesn't require retraining the neural network.
Auditable Truth: We can clearly trace which data the AI used to generate its response.
Cost Efficient: Updating a Vector Database is far cheaper than retraining large models.

Base LLM: Grammar | Facts
RAG System: Grammar | Facts