AILab Howest

Howest Logo

/

Enhancing LLM Accuracy with Retrieval-Augmented Generation (RAG)

At AI Lab, we are currently participating in a European project called Art-IE but what exactly is Art-IE?

Art-IE is a European initiative aimed at helping small businesses innovate with AI. The project consists of three main groups:

1.Applied AI Lab

2.AI Robotica Lab

3.Federated Learning Lab

Each group has its own area of expertise. Howest AI Lab is part of the Applied AI Lab, where we focus on AI applications for mobile devices, VR headsets, and more.

One of our key research areas is Retrieval-Augmented Generation (RAG)—ensuring that AI provides accurate, reliable responses without hallucinating or generating false information.

Cover image

Quick facts

  • /

    Garbage in equals garbage out

  • /

    Retrieval is critical for accuracy

  • /

    Corrective RAG validates document relevance

What is RAG?

RAG stands for Retrieval-Augmented Generation, and it consists of three main components:

1.Retrieval – Finding the Right Data

2.Augmentation – Enhancing the Input

3.Generation – Creating the Final Answer

In this post, we will focus on the retrieval process and explore two RAG implementations in detail.

But first, let’s break them down the different RAG steps.

1. Retrieval – Finding the Right Data

The first and arguably most crucial step is retrieval. Here, the system takes the user’s query and searches for relevant information in a database. If this step fails, the rest of the process becomes meaningless.

Why is retrieval so important?

Because if we don’t retrieve the right data, it doesn’t matter how well we refine or process it in the next steps—the final output will still be incorrect. This is often summed up in IT with the phrase: “Garbage in, garbage out.”

Imagine a user asks: “What are image embeddings?”

If the retrieval step fails and the system retrieves documents that only discuss text embeddings, the AI will generate a misleading or incorrect answer—because it never received the right information in the first place.

Good input (relevant documents about image embeddings) → Correct answer

Bad input (unrelated or missing data) → Incorrect or vague response

This is why high-quality retrieval is crucial, if the AI doesn’t have the right data, no amount of processing can fix the output.

Because of this there are various retrieval techniques, which off we will explore 2 (Corrective RAG and RAG-Fusion) further in the article.

2. Augmentation – Refining the Input

Once we retrieve the relevant data, the next step is augmenting it by incorporating it into the AI’s prompt. This step is crucial because it helps shape how the model generates its response.

By carefully structuring the prompt, we can guide the model’s output to match a specific format or style. Additionally, we can enforce constraints that ensure the response is strictly based on the provided documents—this is particularly useful for sensitive or highly precise topics.

Techniques like prompt engineering play a key role in this process, but we’ll explore those in more detail later.

3. Generation – Producing the Final Answer

In the final step, the system utilises either open-source or closed-source AI models to generate a response based on the retrieved and augmented data.

The choice of model is critical, as some closed-source models may not fully comply with GDPR or other ethical regulations. Selecting the right model ensures that the output aligns with both technical requirements and legal considerations.

A Simple Analogy

Imagine two people taking the same test.

Person A relies only on what they’ve learned so far.

Person B has the same knowledge but also has access to a library full of information and, even better, a map to quickly find exactly what they need.

RAG works like Person B, combining existing knowledge with external data retrieval to generate more accurate and context-aware answers.

Now that we know what RAG is and how it works, we can dive deeper into each part.

What Happens Before RAG?

One crucial aspect we haven’t discussed yet is what happens before the RAG process begins. How do we obtain the data? What kind of data can we use?

This is known as the preprocessing stage. For our applications, the data can come from text files, PDFs, images, and more.

We won’t go too deep into this topic in this article, as it deserves a dedicated discussion. However, to give some context, the preprocessing phase consists of:

1.Collecting the data

2.Preprocessing (e.g., chunking, text cleaning, etc.)

3.Embedding and storing data in a vector database

With this background in mind, let’s move on to the first step of the RAG process: Retrieval.

Implementing RAG-Fusion

One of the methods we’ve implemented is RAG-Fusion—but what exactly is RAG-Fusion?

RAG-Fusion enhances the retrieval process by generating multiple search queries, retrieving documents for each, and then reranking them for optimal relevance. It works in three key steps:

1. Query Generation

Instead of relying on a single user query, the query generator creates multiple related search queries. This helps provide additional context and increases the chances of retrieving the most relevant documents.

For example, if a user asks the simple question:

“What are image embeddings?”

The query generator might expand this into:

1.How do image embeddings work in computer vision?

2.What are the applications of image embeddings in machine learning?

3.How can I create image embeddings using Python libraries?

By broadening the scope of the query, the system can retrieve more comprehensive and contextual information, leading to clearer, more informative answers that include examples and real-world applications.

2. Document Retrieval & Reranking

For each generated query, the system retrieves relevant documents. Since multiple queries may return overlapping or differently ranked results, we use a reranking algorithm to prioritize the most relevant information.

A commonly used method for reranking is Reciprocal Rank Fusion (RRF), which assigns a score based on ranking position:

3. Augmentation & Answer Generation

Once the documents are reranked, the most relevant ones move to the augmentation step, where they are incorporated into the prompt. Finally, the AI model generates a well-informed response.

By using RAG-Fusion, we increase retrieval accuracy and enhance the overall quality of generated answers.

Implementing Corrective RAG

After RAG-Fusion, we also integrated Corrective RAG—but what exactly is Corrective RAG?

As the name suggests, Corrective RAG aims to improve the accuracy of the retrieval process by verifying the relevance of each retrieved document before it is used in the generation step.

How Does Corrective RAG Work?

Corrective RAG enhances the retrieval process by evaluating each document to ensure it aligns with the user’s query. This is done by prompting a large language model (LLM) to assess the document’s relevance.

One approach involves a semantic relevance check, where the LLM looks for keywords or related meanings within the document. The following prompt might be used:

“You are a grader assessing relevance of a retrieved document to a user question.

If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.

Give a binary score ‘yes’ or ‘no’ to indicate whether the document is relevant to the question.

User Question: {query}

Retrieved Document: {context}

Relevant:

This method ensures that only documents containing meaningful context related to the query are considered.

Hard Corrective Prompts

For certain applications, a stricter validation is needed—especially when the answer must be explicitly present in the retrieved documents. This is useful in scenarios such as technical documentation searches, where the response should directly reference error codes or exact solutions.

In these cases, we can use a hard corrective prompt that enforces a stricter requirement:

“You are a grader assessing relevance of a retrieved document to a user question.

If the document contains the answer or part of the answer to the question, return yes; otherwise, return no.

User Question: {query}

Retrieved Document: {context}

Answer:

By applying Corrective RAG, we can significantly reduce irrelevant or misleading responses, ensuring that the AI provides precise, fact-based answers.

What Happens After the Grader?

Once the grader evaluates the retrieved documents, the next step is to optimize the search query and use web crawlers or scrapers to find more relevant or accurate sources.

This can be done using custom code or existing open-source or closed-source solutions. However, this step requires careful handling, as not all online information is reliable. Search engines and websites contain misleading, outdated, or irrelevant content, so additional safeguards are necessary.

Key Considerations for Web Scraping

To improve the quality of retrieved documents, we apply several precautions:

1.Restrict sources – Define a set of approved URLs to ensure the scraper only retrieves information from trusted websites.

2.Limit depth and scope – Restrict the scraper to the top X results and prevent it from crawling too deeply into irrelevant pages.

3.Maintain query alignment – Ensure the search query remains true to the user’s original intent to avoid off-topic results.

4.Respect website policies – Always check and comply with a website’s robots.txt file and avoid scraping sites that prohibit it. This is crucial, as unauthorised scraping is becoming increasingly restricted.

Final Answer Generation

Once the scraped documents are added to the context, the system can generate a final, well-informed response. To maintain transparency, we always include the URLs of the sources used so the user knows exactly where the information came from.

Augmentation – Structuring the Input

Now that we’ve retrieved the documents and verified their accuracy, we can move on to the Augmentation step.

Why is Augmentation Important?

Augmentation is just as crucial as retrieval because it determines how the LLM processes and integrates the retrieved data. Even with the right information, we still need to guide the model by clearly defining the goal and instructing it on how to use the data effectively.

The prompt must be carefully tailored to the specific use case. For example, if the system is designed for technical RAG focused on error codes, the prompt might look like this:

“Context you need to use: {context}

Query: {query}

Instructions:

Based on the provided context, analyse the relevant error codes for the user query and generate a structured response that helps troubleshoot the issue. Ensure the response includes:

•The error code

•Its description

Possible causes

Recommended solutions

Prioritise the most relevant and critical information to assist the user efficiently.

If the context doesn’t contain the facts or information related to the user’s query, return:

“This information is not available.”

Answer:”

By structuring the prompt in this way, we ensure that the LLM understands the objective and generates precise, well-structured responses.

As mentioned earlier, augmentation can be a complex and extensive process, but this article focuses on providing a general overview of RAG, with an emphasis on two different retrieval methods.

Final Step: Generation

The last step, Generation, is arguably the simplest of the three. Here, we take the augmented prompt and send it to our LLM of choice, which then generates a response that is delivered to the user.

Sources:

https://arxiv.org/pdf/2402.03367

https://arxiv.org/pdf/2401.15884

Wrapping Up

We’ve now explored the three key pillars of RAG, with a particular focus on two advanced retrieval techniques. This concludes our overview of RAG and its role in improving AI-driven search and response systems.

If you’re interested in diving deeper into specific aspects of RAG, let us know—we might cover it in a future article!

Authors

  • /

    Tim Bleuzé, Intern

  • /

    Jens Eeckhout, AI & XR researcher

Want to know more about our team?

Visit the team page