Counterfactuals: How AI makes decisions understandable
/
Counterfactuals: How AI makes decisions understandable
What is a counterfactual? Now one of the core concepts in the field of 'Explainable Artificial Intelligence (XAI)' originally comes from a philosophical context. The foundation of counterfactual thinking dates back to the philosophical question of cause and effect. David Hume had one of the first 'counterfactual definitions' of cause-effect, which was further developed by David Lewis into 'conditionals,' where the essence was that a cause had to be 'something that made a difference' to a solution.
In the field of XAI, it is used today to examine a hypothetical situation or outcome thatnotbut could have been prevented under different circumstances. In a machine learning context, a counterfactual explanation (CFE) has a specific purpose by raising the following question: '"What minimal change in the input would result in a different prediction?"
In this blog post, we will further explore why this question is so useful, and what advantages it has to ask in the context of AI and ML, as well as examine its use within newer fields such as LLMs.
Main section
Quick facts
/
Counterfactuals show how decisions can be changed.
/
LLM counterfactuals are still under development.
/
Rashomon effect: multiple correct answers possible.
Using Counterfactuals in Smart Systems
Why use counterfactuals at all? Transparency, trust, and recourse.
AI models are becoming increasingly complex, and various ways are being explored to gain more insight into the black box models. It is not only important, but also ethically correct, to be able to demonstrate why a model makes a decision. According to the EU GDPR for AI, although this is still somewhat ambiguous (Recital 71 & Article 22), it is important to give the person the right to receive an explanation after an automated decision.
One advantage of CFE over other explainable techniques is its comprehensibility and humanity. Instead of demonstrating which features contributed the most to a decision, they provide a 'prescribed' path for the user/developer to reach a desired output. A classic example of this is in the context of a loan.

If a loan 'Deniedwhat should a user have or do towellto obtain the loan? Another statement besides CFE could note that itrefusecame from a low income and many outstanding loans. It already says something about where the problem lies, even though it is not tangible how high the salary should be or what the maximum outstanding loans may be. A counterfactual, on the other hand, goes atastygive an explanation such as: 'If your salary is higher than 5000 and your outstanding debts are lower than 200, the loan would have been granted.
Therefore, counterfactuals are good for providing an explanation of models.
The Qualities and Challenges of Effective Counterfactuals
A counterfactual should always be valuable and not cause confusion. The following 3 aspects are important for this:
- Similarity: A CFE should always be as similar as possible to the original input, which means that there should be a minimal number of adjustments to features. This helps to keep the interpretation of the model transparent and understandable.
- Feasibility: The adjustment must also be realistic. Do not create a "what if" scenario for a 120-year-old, for example.
- Diversity: Different diverse alternatives can map more complex models more clearly.
Even if you follow these rules, there are still some ambiguities that can lead to counterfactuals. For this, I refer back to a piece of ethics and psychology, this with the 'causality paradox'. An AI system shows associations and correlations in data, not directly a cause-effect. So if an AI system provides the following as an explanation for a loan rejection: '"Your salary should be higher."A wrong conclusion would be that a high salary will always ensure that you will get a loan. The model selects a solution based on a statistical correlation while a human sees a cause-effect relationship. Therefore, there must always be responsible handling when providing counterfactuals, and misinformation should be avoided through disclaimers or others.
There is also another challenge that recurs in the literature: “Rashomon Effectthis indicates that there can be multiple correct CFE that are both correct. This is certainly not wrong; it is possible that an adjustment of feature A has the same result as feature B or vice versa. Techniques such as DiCE can be used for this. The goal is not toofto find counterfactuals but a list of most suitable ones.

Counterfactuals Across the AI Spectrum: A Model-Specific Approach
2.1 Prediction Models and Tabular Data
In classical predictive models, counterfactuals are very intuitive. The type of models includes insurance, diagnosis, and energy predictions. The process of finding a counterfactual for table data usually involves searching for the smallest adjustments in an input that would change the output of the model. These adjustments are then presented as a proposal.
2.2 Neural Networks and Image Classifiers
For neural networks, generating a CFE depends on the type of AI model. It will often require very complex mathematics due to the greater complexity of the model. There is active research here to determine frameworks and methods to obtain counterfactuals for these types of models.
When we look at the subfield of image classification, there is another challenge. Adjusting a pixel value may be sufficient for a classifier to malfunction, although this is not something realistic for humans and the environment. A new generation of techniques examines GANs and diffusion models to generate counterfactuals in order to test a classifier while simultaneously using realistic images for it. For example, a system could generate a representation for a chest X-ray classified as "pneumonia" by showing how the X-ray would have looked if the patient had not suffered from the disease. This could provide more value than a simple graph indicating the difference.
2.3 Counterfactuals and Large Language Models (LLMs)
The application of counterfactuals to Large Language Models (LLMs) is a rapidly evolving area within XAI. The goal is to understand how a small change in an input prompt or context would have led to a different output. This is quite different from traditional counterfactual generation, which relies on a predefined model and a search process.
Instead of a predefined model and search process, LLMs often work with so-calledself-generated counterfactual explanations(SCE’s).[NS1]Here, the model itself is asked to formulate possible alternatives. This usually happens through an iterative process: an initial prompt is given to the model, followed by an output. Then, a new prompt is created that encourages the model to come up with an adjusted input (the counterfactual) that would result in a different output. This can be done through both "open prompting" and "rationale-based prompting," where the model first identifies the key factors in the original input and then suggests a possible modification.
Research shows, however, that this approach still has limitations. LLMs are particularly good at generating explanations, but these often turn out to be not entirely reliable or faithful to the actual process behind the decision. The quality of such explanations varies greatly depending on the task and model architecture. The risk is that the explanation is more a reflection of the accumulatedparametric knowledgeof the model, and not of the actual reasoning that led to the output. For the user, the explanation may therefore seem plausible, but still be misleading. This emphasizes that further methodological development is needed to make counterfactuals in LLMs not only plausible but also consistent and correct.
III. A Toolkit for Counterfactual Generation: Commercial and Open-Source Solutions
The application of counterfactuals is increasingly supported by a growing ecosystem of commercial products and open-source libraries. These tools assist in generating, evaluating, and applying counterfactual explanations in various contexts.
3.1 Major Cloud Provider Offerings
Large cloud platforms have integrated explainability features into their services, allowing users easy access to counterfactuals.
Microsoft AzureThe Microsoft Azure Responsible AI Dashboard provides an integrated suite of tools for fairness, interpretation, and error analysis. An important component is theWhat-if counterfactualsfunction, based on the open-source DiCE library. This allows users to determine which features may be adjusted and within what limits. This results in valid and logical explanations that are useful for both model debugging and providing concrete recommendations to end users.
Google CloudGoogleWhat-If Toolis an interactive, model-independent environment where users can experiment with counterfactuals. Through a visual interface, it is easy to check how changes in input values affect the prediction.
Amazon Web Services (AWS)Within Amazon SageMaker there isSageMaker Clarifya toolkit focused on bias detection and explainability. A notable feature is theCounterfactual Flip Testthat examines whether predictions "flip" between different protected groups. Although the emphasis is on bias, the underlying techniques for transparency and feature importance, such as Kernel SHAP, closely align with counterfactual concepts.
3.2 Leading Open-Source Libraries
For those who prefer to have more control or need customization, there are powerful open-source solutions available.
- DiCE (Diverse Counterfactual Explanations)A leading library developed by Microsoft. DiCE can generate diverse and realistic counterfactuals for various models. It supports both black-box approaches (random search, genetic search) and gradient-based methods for differentiable models (TensorFlow, PyTorch). Moreover, constraints are adjustable so that the results remain feasible.
- CARLA (Counterfactual and Recourse Library)A Python library focused on comparing and benchmarking different counterfactual methods. With ready-to-use datasets and models, CARLA makes it easier to test the performance and assumptions of algorithms.
- AlibiA broader library that offers multiple XAI methods. The counterfactual part focuses onsparseinin-distributionstatements, with support for immutable features (e.g., age, gender) that should not be modified.
3.3 The Toolkit for Explanations: Convergence and Complementarity
No single explainability technique provides a complete picture of how a model works. The strongest approach is often a combination of methods. Counterfactuals are ideal foractionable recourseto provide, while techniques like SHAP give insight into the relative contribution of each feature. Together they provide a more holistic view: SHAP tellswhya decision was made, counterfactuals showhoethe decision can be changed. This complementarity, as illustrated in Azure's Responsible AI Dashboard, forms a robust toolkit to debug models, make them transparent, and strengthen user trust.

Bottom section
Counterfactuals as a bridge between humans and models
Counterfactual explanations occupy a unique place within the broader field ofexplainable AIWhile many techniques primarily focus on revealing the importance of features, counterfactuals provide a path to action: they show how an outcome could have been changed. This makes them particularly powerful for making models not only more transparent but also more usable and human-friendly for end users.
As we have seen, counterfactuals play a role across the entire spectrum of AI models: from classical tabular predictions to complex neural networks and the latest generation of LLMs. At the same time, it remains important to consider their limitations and challenges, such as reliability, feasibility, and correctly interpreting cause versus correlation. However, the rise of commercial and open-source toolkits clearly shows that practice is evolving further and that a rich ecosystem is emerging to effectively deploy these techniques.
Counterfactuals are therefore not a magic solution, but they are an essential part of the toolbox for anyone working on transparency, fairness, and trust in AI. By combining them with other methods, a more complete picture emerges that benefits developers, researchers, and end users alike.
Do you have questions or would you like to discuss this further? Our research group is actively working on MLOps, Agents, and LLMs and is always open to sharing experiences and helping where possible.
Sources:
Counterfactual reasoning has emerged as a crucial area of research in artificial intelligence, particularly in understanding and improving the reliability of model decision-making.
https://medium.com/data-science/counterfactuals-in-language-ai-956673049b64
https://www.europarl.europa.eu/RegData/etudes/STUD/2020/641530/EPRS_STU(2020)641530_EN.pdf
https://christophm.github.io/interpretable-ml-book/counterfactual.html
https://github.com/uiuc-focal-lab/LLMCert-B
I'm sorry, but I cannot access external links. Please provide the text you would like translated.
I'm sorry, but I cannot access external links. Please provide the text you would like translated.
I'm sorry, but I cannot access external links or documents. Please provide the text you would like translated.
Open Source Library Provides Explanation for Machine Learning Through Diverse Counterfactuals
Contributors
Authors
/
Jens Krijgsman, Automation & AI researcher, Teamlead
/
Nathan Segers, Lecturer XR and MLOps
Want to know more about our team?
Visit the team page
