Enhancing LLM Accuracy with Retrieval-Augmented Generation (RAG)
/
Enhancing LLM Accuracy with Retrieval-Augmented Generation (RAG)
At AI Lab, we are currently participating in a European project called Art-IE but what exactly is Art-IE?
Art-IE is a European initiative aimed at helping small businesses innovate with AI. The project consists of three main groups:
1.Applied AI Lab
2.AI Robotica Lab
3.Federated Learning Lab
Each group has its own area of expertise. Howest AI Lab is part of the Applied AI Lab, where we focus on AI applications for mobile devices, VR headsets, and more.
One of our key research areas is Retrieval-Augmented Generation (RAG)—ensuring that AI provides accurate, reliable responses without hallucinating or generating false information.

Main section
Quick facts
/
Garbage in equals garbage out
/
Retrieval is critical for accuracy
/
Corrective RAG validates document relevance
What is RAG?
RAG stands for Retrieval-Augmented Generation, and it consists of three main components:
1.Retrieval – Finding the Right Data
2.Augmentation – Enhancing the Input
3.Generation – Creating the Final Answer

In this post, we will focus on the retrieval process and explore two RAG implementations in detail.
But first, let’s break them down the different RAG steps.
1. Retrieval – Finding the Right Data
The first and arguably most crucial step is retrieval. Here, the system takes the user’s query and searches for relevant information in a database. If this step fails, the rest of the process becomes meaningless.
Why is retrieval so important?
Because if we don’t retrieve the right data, it doesn’t matter how well we refine or process it in the next steps—the final output will still be incorrect. This is often summed up in IT with the phrase: “Garbage in, garbage out.”
Imagine a user asks: “What are image embeddings?”
If the retrieval step fails and the system retrieves documents that only discuss text embeddings, the AI will generate a misleading or incorrect answer—because it never received the right information in the first place.
Good input (relevant documents about image embeddings) → Correct answer
Bad input (unrelated or missing data) → Incorrect or vague response
This is why high-quality retrieval is crucial, if the AI doesn’t have the right data, no amount of processing can fix the output.
Because of this there are various retrieval techniques, which off we will explore 2 (Corrective RAG and RAG-Fusion) further in the article.
2. Augmentation – Refining the Input
Once we retrieve the relevant data, the next step is augmenting it by incorporating it into the AI’s prompt. This step is crucial because it helps shape how the model generates its response.
By carefully structuring the prompt, we can guide the model’s output to match a specific format or style. Additionally, we can enforce constraints that ensure the response is strictly based on the provided documents—this is particularly useful for sensitive or highly precise topics.
Techniques like prompt engineering play a key role in this process, but we’ll explore those in more detail later.
3. Generation – Producing the Final Answer
In the final step, the system utilises either open-source or closed-source AI models to generate a response based on the retrieved and augmented data.
The choice of model is critical, as some closed-source models may not fully comply with GDPR or other ethical regulations. Selecting the right model ensures that the output aligns with both technical requirements and legal considerations.
A Simple Analogy
Imagine two people taking the same test.
•Person A relies only on what they’ve learned so far.
•Person B has the same knowledge but also has access to a library full of information and, even better, a map to quickly find exactly what they need.
RAG works like Person B, combining existing knowledge with external data retrieval to generate more accurate and context-aware answers.

Now that we know what RAG is and how it works, we can dive deeper into each part.
What Happens Before RAG?
One crucial aspect we haven’t discussed yet is what happens before the RAG process begins. How do we obtain the data? What kind of data can we use?
This is known as the preprocessing stage. For our applications, the data can come from text files, PDFs, images, and more.
We won’t go too deep into this topic in this article, as it deserves a dedicated discussion. However, to give some context, the preprocessing phase consists of:
1.Collecting the data
2.Preprocessing (e.g., chunking, text cleaning, etc.)
3.Embedding and storing data in a vector database
With this background in mind, let’s move on to the first step of the RAG process: Retrieval.
Implementing RAG-Fusion
One of the methods we’ve implemented is RAG-Fusion—but what exactly is RAG-Fusion?
RAG-Fusion enhances the retrieval process by generating multiple search queries, retrieving documents for each, and then reranking them for optimal relevance. It works in three key steps:
1. Query Generation
Instead of relying on a single user query, the query generator creates multiple related search queries. This helps provide additional context and increases the chances of retrieving the most relevant documents.
For example, if a user asks the simple question:
“What are image embeddings?”
The query generator might expand this into:
1.How do image embeddings work in computer vision?
2.What are the applications of image embeddings in machine learning?
3.How can I create image embeddings using Python libraries?
By broadening the scope of the query, the system can retrieve more comprehensive and contextual information, leading to clearer, more informative answers that include examples and real-world applications.
2. Document Retrieval & Reranking
For each generated query, the system retrieves relevant documents. Since multiple queries may return overlapping or differently ranked results, we use a reranking algorithm to prioritize the most relevant information.
A commonly used method for reranking is Reciprocal Rank Fusion (RRF), which assigns a score based on ranking position:

3. Augmentation & Answer Generation
Once the documents are reranked, the most relevant ones move to the augmentation step, where they are incorporated into the prompt. Finally, the AI model generates a well-informed response.
By using RAG-Fusion, we increase retrieval accuracy and enhance the overall quality of generated answers.

Implementing Corrective RAG
After RAG-Fusion, we also integrated Corrective RAG—but what exactly is Corrective RAG?
As the name suggests, Corrective RAG aims to improve the accuracy of the retrieval process by verifying the relevance of each retrieved document before it is used in the generation step.
How Does Corrective RAG Work?
Corrective RAG enhances the retrieval process by evaluating each document to ensure it aligns with the user’s query. This is done by prompting a large language model (LLM) to assess the document’s relevance.
One approach involves a semantic relevance check, where the LLM looks for keywords or related meanings within the document. The following prompt might be used:
“You are a grader assessing relevance of a retrieved document to a user question.
If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.
Give a binary score ‘yes’ or ‘no’ to indicate whether the document is relevant to the question.
User Question: {query}
Retrieved Document: {context}
Relevant:”
This method ensures that only documents containing meaningful context related to the query are considered.
Hard Corrective Prompts
For certain applications, a stricter validation is needed—especially when the answer must be explicitly present in the retrieved documents. This is useful in scenarios such as technical documentation searches, where the response should directly reference error codes or exact solutions.
In these cases, we can use a hard corrective prompt that enforces a stricter requirement:
“You are a grader assessing relevance of a retrieved document to a user question.
If the document contains the answer or part of the answer to the question, return yes; otherwise, return no.
User Question: {query}
Retrieved Document: {context}
Answer:”
By applying Corrective RAG, we can significantly reduce irrelevant or misleading responses, ensuring that the AI provides precise, fact-based answers.
What Happens After the Grader?
Once the grader evaluates the retrieved documents, the next step is to optimize the search query and use web crawlers or scrapers to find more relevant or accurate sources.
This can be done using custom code or existing open-source or closed-source solutions. However, this step requires careful handling, as not all online information is reliable. Search engines and websites contain misleading, outdated, or irrelevant content, so additional safeguards are necessary.
Key Considerations for Web Scraping
To improve the quality of retrieved documents, we apply several precautions:
1.Restrict sources – Define a set of approved URLs to ensure the scraper only retrieves information from trusted websites.
2.Limit depth and scope – Restrict the scraper to the top X results and prevent it from crawling too deeply into irrelevant pages.
3.Maintain query alignment – Ensure the search query remains true to the user’s original intent to avoid off-topic results.
4.Respect website policies – Always check and comply with a website’s robots.txt file and avoid scraping sites that prohibit it. This is crucial, as unauthorised scraping is becoming increasingly restricted.
Final Answer Generation
Once the scraped documents are added to the context, the system can generate a final, well-informed response. To maintain transparency, we always include the URLs of the sources used so the user knows exactly where the information came from.

Augmentation – Structuring the Input
Now that we’ve retrieved the documents and verified their accuracy, we can move on to the Augmentation step.
Why is Augmentation Important?
Augmentation is just as crucial as retrieval because it determines how the LLM processes and integrates the retrieved data. Even with the right information, we still need to guide the model by clearly defining the goal and instructing it on how to use the data effectively.
The prompt must be carefully tailored to the specific use case. For example, if the system is designed for technical RAG focused on error codes, the prompt might look like this:
“Context you need to use: {context}
Query: {query}
Instructions:
Based on the provided context, analyse the relevant error codes for the user query and generate a structured response that helps troubleshoot the issue. Ensure the response includes:
•The error code
•Its description
•Possible causes
•Recommended solutions
Prioritise the most relevant and critical information to assist the user efficiently.
If the context doesn’t contain the facts or information related to the user’s query, return:
“This information is not available.”
Answer:”
By structuring the prompt in this way, we ensure that the LLM understands the objective and generates precise, well-structured responses.
As mentioned earlier, augmentation can be a complex and extensive process, but this article focuses on providing a general overview of RAG, with an emphasis on two different retrieval methods.
Final Step: Generation
The last step, Generation, is arguably the simplest of the three. Here, we take the augmented prompt and send it to our LLM of choice, which then generates a response that is delivered to the user.
Sources:
Bottom section
Wrapping Up
We’ve now explored the three key pillars of RAG, with a particular focus on two advanced retrieval techniques. This concludes our overview of RAG and its role in improving AI-driven search and response systems.
If you’re interested in diving deeper into specific aspects of RAG, let us know—we might cover it in a future article!
Contributors
Authors
/
Tim Bleuzé, Intern
/
Jens Eeckhout, AI & XR researcher
Want to know more about our team?
Visit the team page