AILab Howest

Howest Logo

/

Workshop Private Agentic AI

Original language: English
Oct 6, 2025

In the context of theArt-IEproject - funded byInterreg Flanders Netherlands- organized Howest AI Lab a workshop “Private Agentic AI"We went deeper into the following topics:"

  • How LLMs understand natural language
  • RAG & AI agents
  • LLM frameworks & guardrailing
  • Hardware and software requirements for infrastructure

The focus was on aprivacy-first approachwhere we host everything ourselves on our new powerful AI server. This blog post briefly discusses the content of the workshop and the tools used. A less technical workshop that is accessible to everyone will follow in mid-November (more info at the bottom).

Schrijf je alvast in via de forms

Quick facts

  • /

    Agents automate your workflow.

  • /

    LLM guardrails protect your chatbot.

  • /

    Kubernetes & NVIDIA for AI servers

Content of the workshop

Embedding models

A computer does not perceive meaning in text like a human does, but it can perform calculations with numbers at high speeds. Aembedderis aAI modelturn that text into a series of numbers that we avectornames. These vector representations have two important properties:

  1. They contain information that says something about therelationships between words, but it can be demonstrated using vectors that "brunch" in terms of meaning lies somewhere between "breakfast" and "lunch." Embedders also assign similar numbers to synonyms.
  2. We can there thestrength of similarityexpress between two words. There are mathematical calculations behind this such as the “cosine similarity.

There are many open-source embedding models available onHuggingFaceinOllamaDuring the workshop, we got to work withEmbeddingGemmaa recent development from Google.

RAG

LLMs are in terms oflimited knowledgeto their training dataset. They can alsohallucinatingand are they sometimesbiased.

Retrieval Augmented GenerationRAG is an implementation of LLMs where we can query our own knowledge database. The chatbot can therefore consult and mention sources in its response. This largely avoids the problems with LLMs, but RAG can also make mistakes.

Such aknowledge databasetypically consists of your business documents, or documentation of machines in an industrial context. These are oftenconfidentialof containsensitive personal data.

To guarantee privacy, we can also host the entire RAG pipeline locally ourselves. There are open-source "all-in-one" platforms such asRAGFlowthe man withDockercan run on any computer. During the workshop, we manually worked out RAG withLangChainin Python.

If privacy is not a priority, there are also cloud alternatives such asNotebookLMfrom Google.

Agents

An agent is an LLM thatcan perform actions, these are some examples:

  • Querying a knowledge database (RAG)
  • Fetching the weather report
  • Sending an email in your name

In the context of agents, we refer to these actions as "toolsTo make the LLM aware of the available tools, we use theMCP protocol.

Many apps now also offer aMCP serveryes, this contains the tools with which an LLM can perform actions on that app. A developer or platform such asn8ncan use this to build an agent. During the workshop, we created our ownlocal MCP serverset up withFastMCPin Python.

Guardrailing

LLMs and agents are vulnerable tomalicious prompts(prompt injection of jailbreaking). Also, unwanted outputs such as profanity, personal data, or incorrect languages can be problematic. Therefore, areextra security measuresessential in production environments.

During the workshop, we usedGuardrails AIa Python library that contains pre-made guardrails. You can use it to pre- or post-process user prompts and LLM responses and perform checks on them. Underlying this, they use various techniques, one of which is the use offine-tuned language modelsspecialized in guardrailing tasks (e.g.Llama Guard 3Another open-source alternative isGranite Guardianfrom IBM.

Infrastructure

During the workshop, we also delved deeper into the infrastructure requirements to run AI applications. In many cases, you need powerful NVIDIA GPUs. Through subsidies via the Interreg Flanders-Netherlands Art-IE project & a VLAIO infrastructure call, Howest has purchased a new AI server with the specifications below:

  • CPU:2x Intel Xeon Platinum 8562Y+ (128 threads)
  • GPU:8x NVIDIA H200 SMX (141GB VRAM)
  • RAM:2TB DDR5
  • Storage:2x Dell NVMe 7500 (3.84TB) + 5x Micron 9300 (3.5TB)

It is important that there is quite a bithidden costsadditional costs such as electricity, licenses & software, support, …

In terms of software, we useKubernetesin combination withKubeflowThis is aopen-source ecosystemthat integrates well with the hardware & software ofNVIDIAKubeflow primarily consists of a modularcentral dashboardYou can add modules to enable the creation of Jupyter Notebooks, pipelines, model training, etc., and manage their resources.


Next workshop

Do you want to get inspired about how you can do your work?automateinoptimizemet agents? In mid-November, we are organizing within the framework of the projectAIUPD8a follow-up workshop where wehands-ongetting started with agent platforms like n8n, Copilot, ChatGPT agents, Relevance AI, and more. For these platforms, you even haveno programming knowledgemore needed and is therefore accessible to everyone!

Sources

Cosine similarity

https://huggingface.co/models?other=embeddings

https://ollama.com/search?c=embedding

https://developers.googleblog.com/en/introducing-embeddinggemma

https://ragflow.io/

https://www.docker.com

https://www.langchain.com

https://notebooklm.google

https://modelcontextprotocol.io/docs/getting-started/intro

https://n8n.io

https://gofastmcp.com/getting-started/welcome

https://www.guardrailsai.com/

https://ollama.com/library/llama-guard3

Granite Guardian

https://kubernetes.io/

https://www.kubeflow.org/

https://aiupdate.be/

Authors

  • /

    Thomas Huyghebaert, AI Researcher

Want to know more about our team?

Visit the team page