AILab Howest

Howest Logo

/

Model Drift: How to Keep AI Models Working in a Changing World

Original language: English
Aug 26, 2025

In a world where machine learning models are increasingly deployed in production environments, monitoring their performance over time is essential. Models that accurately predict today may already fall out of step tomorrow. This phenomenon, better known as model drift, poses a serious challenge for companies that use AI at scale.

In this blog post, we take you into the world of model drift:

  • What it is exactly and what forms exist?
  • Why it is so important to notice it early?
  • How can you apply drift detection to both tabular and non-tabular data, such as images?
  • And how did we build a practical demo in Jupyter Notebook that visualizes drift?

Whether you are a data scientist maintaining AI solutions or an MLOps engineer focusing on monitoring, this blog post shows how to keep your models reliable in a changing world.

Quick facts

  • /

    Model drift typically occurs within 3 to 6 months.

  • /

    There are several forms of drift.

  • /

    Drift in non-tabular data, such as images or audio, is harder to detect.

What is model drift and what forms does it take?

Model drift, also known as concept drift, refers to the situation in which an AI or machine learning model begins to perform worse because the data it is supposed to predict has changed compared to the data it was trained on. In a world that is continuously evolving, it makes sense that user behavior, environmental factors, or underlying trends change – and thus the relevance of the model as well.

There are different forms of model drift:

  1. Covariate Drift: Here, the distribution of the input features (X) changes, but the relationship with the output (Y) remains the same. For example: a traffic prediction model that suddenly sees many more electric vehicles in the data.
  2. Label Drift (Prior Probability Shift): Here, the distribution of the target variable (Y) changes. Think of a fraud detection model where the number of fraud cases in the dataset increases or decreases significantly.
  3. Concept Drift: Here, the relationship between input and output (P(Y|X)) changes. For example: a model that predicts customer satisfaction based on review texts, but the meaning of words like "fast" or "cheap" changes over time or culture.
  4. Data Quality Drift: Sometimes it is not the data itself that changes, but its quality. Sensors that function poorly or datasets that become incomplete or inconsistent can negatively impact the model.

Detecting and addressing model drift ensures that your AI model remains relevant and reliable, even as the world around it changes.

How can you detect model drift?

Detecting model drift is crucial to avoid your model deteriorating invisibly. Fortunately, there are various strategies and tools to proactively monitor this, both for tabular and non-tabular data.

Statistical monitoring

The simplest way to detect drift is by regularly comparing distributions. Think of:

  1. Kolmogorov-Smirnov tests for continuous features
  2. Chi-square tests for categorical variables
  3. Population Stability Index (PSI) to measure differences in distributions over time

These methods are mainly used for tabular data and are easy to automate.

Model performance tracking

A decrease in metrics such as accuracy, precision, recall, or AUC may indicate drift. Important here:

  1. Meet performance on recent ground truth data
  2. Use rolling windows for trend analysis
  3. Combine with confidence monitoring (how certain is the model of its predictions?)

Embedding of representation equation

For non-tabular data (such as images, text, or audio), classical distribution analysis is often not sufficient. Here you use:

  1. Embeddings from a neural network (such as activations from an intermediate layer)
  2. Visualizations such as t-SNE or UMAP
  3. Comparison via Fréchet Inception Distance (FID) of Maximum Mean Discrepancy (MMD)

For example, you can detect whether new images contain fundamentally different content than your training set.

Dedicated tools

Platforms like Azure Machine Learning, AWS SageMaker offer built-in model monitoring. They support:

  1. Automatic drift detective
  2. Dashboards with real-time statistics
  3. Alerts for exceeding threshold values

For example, in Azure ML you can use DataDriftDetector to compare datasets through schemas and statistics, including visual comparison of features.

It is not enough to use one metric or method. A robust detection strategy combines statistics, performance monitoring, and embedding comparison, tailored to the data type of your model input.

How do you detect model drift in non-tabular data?

In contrast to structured tables, detecting drift in images, audio, or text is less straightforward. You cannot easily compare the average pixel value of a photo as you would with a column in an Excel file. Nevertheless, there are effective methods to detect model drift even in non-tabular data. Below are some approaches:

Use of feature or embedding spaces

Instead of comparing the raw data itself (such as the pixel values of images), you can use a model to first transform the data into a more meaningful representation:

  1. For example: use activations from an intermediate layer of a CNN (such as VGG or ResNet)
  2. For text: use transformer-based embeddings (such as BERT)
  3. For audio: use MFCCs or spectrograms as representation.

These vector representations (embeddings) provide a summary of the content of the input, making it possible to measure changes in "meaning" or "content" over time.

Visual techniques

Once you have embeddings, you can apply techniques such as:

  1. t-SNE of UMAP to visualize the distribution of the data in 2D
  2. Cluster analysis to discover new patterns or outliers
  3. Overlays of before-and-after data to illustrate changes

These techniques help you intuitively understand whether the structure of your input data has changed since training your model.

Comparison metric between distributions

You can measure the distance between the distributions of old and new data using:

  • Maximum Mean Discrepancy (MMD)
  • Fréchet Inception Distance (FID) for images
  • Cosine similarity of Euclidean distance between embedding centroids

When these distances exceed a certain threshold, you have an indication of drift.

Using Deepchecks and similar tools

Tools such asDeepchecksprovide specific checks such as ImageDatasetDrift, which automatically:

  1. Processed in batches
  2. Distributions compared through statistics and embeddings
  3. Visualizations show possible deviations.

In our own demo, we used deepchecks.vision (Python library) to detect drift between original images and their brightly lit counterparts.

Practical example: Detection of visual drift

To make model drift in non-tabular data tangible, we built a demo around image classification with Deepchecks Vision. The scenario: a company uses an image recognition model to classify objects on a conveyor belt and automatically sort them by type. The camera capturing these images is located next to a window. During the summer months, the extra sunlight results in much brighter images than in winter, a difference that the model was not aware of during its training. This leads to a potential risk of model drift due to changing lighting conditions.

Comparison of winter data vs summer data

When we visually examine the difference in image data between winter and summer, the problem becomes immediately clear. In winter, the images are darker and more in line with the model's original training data. In summer, however, the extra daylight causes overexposed images, which affects the color distribution and contrasts.

This visual shift may seem subtle at times, but it can have significant consequences for an image classification model that is sensitive to lighting conditions. Below you can see a comparison of typical winter images versus brightly lit summer images, as captured by the production line camera.

As can be seen, a model that was only trained on winter data may struggle with accurate predictions on summer images, which contributes to model drift.

Drift analysis with Deepchecks

For the detection, we used the ImageDatasetDrift check from the deepchecks.vision library. Both datasets were first converted into batches of numpy arrays with corresponding labels via BatchOutputFormat.

This check yields the following result:


Each feature is assigned an importance score in the analysis, indicating the extent to which this property contributed to the detected drift between the datasets. This is particularly valuable for understanding.whyinwhichvisual properties have shifted. In this case, it appears that Brightness plays a dominant role in the distinction between the training data and the test data.

The visualizations below, generated by Deepchecks, clearly show which features had the greatest impact on the detected drift. The Train Dataset contains the winter images on which the model was originally trained, while the Test Dataset consists of summer images that are noticeably brighter due to increased light exposure.

Thanks to this interpretable output, we can specifically analyze which visual features in the data have shifted, and thus which factors may be responsible for the decline in model performance. This way, we not only receive a signal that there is drift, but also insight intowhereinhoethe data has been shifted, which is essential for monitoring and adjustment in production environments.

The check also returns a drift score, a number that summarizes how much shift has occurred between the two datasets. In the visualization below, you can see how this score should be interpreted; the closer to 1, the stronger the drift.

In our case, the domain_classifier_drift_score was 0.9698, indicating a clear shift in data. Such a high score suggests that the model is likely to experience significant performance issues on the new data.

3. Influence on model performance

To concretize the impact of dataset drift on model performance, we trained a simple image classifier based on the VGG architecture. This model was trained exclusively on winter images, which are representative of the original production process in which the model was developed. The model was trained to correctly classify two classes of objects in images.

When testing this model on a separate set of winter images, thus data from exactly the same distribution as it was trained on, the model achieved a perfect accuracy of 100%. This indicates that the model has complete control over the task as long as the input images visually match the training data.

Then we conducted exactly the same test, but with the summer images as input data. These images are identical in content (same classes), but differ in visual properties such as lighting. The summer images are significantly brighter, for example, due to the incoming sunlight. Despite the model performing the same task at the content level, the accuracy score dropped to only 78.57%.

This performance drop clearly indicates that the model is sensitive to visual shifts that fall outside its training distribution. Although the task content remained the same, the change in light intensity (which we also detected earlier through the drift check) resulted in a significant decrease in reliability. In a production environment, this can lead to incorrect classifications and downstream errors, for example in automatic sorting systems.

Finally

Model drift is not a hypothetical problem; it is a reality for every AI application in production. By detecting and understanding changes in data early, you can prevent unexpected performance losses. In this article, we not only explained the theory but also demonstrated how to concretely measure and explain drift in visual data using a image classification demo with Deepchecks.

Do you want to keep your models reliable in a changing world? Then continuous monitoring, even for non-tabular data, is not a luxury but a necessity.

Be sure to check out our Masterclass.Deploying AI Solutionswhere Model Drift is processed in detail. More info on the Howest website:https://www.howest.be/en/education/training/masterclass-deploying-ai-solutions

Authors

  • /

    Hube Knaepkens, intern

  • /

    Nathan Segers, Lecturer XR and MLOps

Want to know more about our team?

Visit the team page