Researchers at the University of Amsterdam (UvA) are working on a new method to make AI models understandable and explainable to humans. While AI models can solve many tasks, but they are also becoming increasingly complex. The field of Explainable AI (XAI) is concerned with unpacking the complex behavior of these models in a way that humans can understand. In a new project, HUE: bridging AI Representations to Human-Understandable Explanations, researchers Giovanni Cinà and Sandro Pezzelle are developing a method that will make it possible to ‘x-ray’ AI models and make them more transparent.
Confirmation bias
“Many AI models are black boxes,” explains Pezzelle. “We can feed them with a lot of data, and they can make a prediction – which may or may not be correct – but we do not know what goes on internally.” This is problematic, because we tend to interpret the output according to our own expectations, also known as confirmation bias.
Cinà: “We are more likely to believe explanations that match our prior beliefs. We accept more easily what makes sense to us, and that can lead us to trust models that are not really trustworthy. This is a big problem, for instance, when we use AI models to interpret medical data in order to detect disease. Unreliable models may start to influence doctors and lead them to misdiagnose results.”
Examining explanations
The researchers are developing a method to mitigate this confirmation bias. “We want to align what we think the model is doing with what it is doing,” Cinà says. “‘To make a model more transparent, we need to examine some explanations for why it came up with a certain prediction.” To do this, the researchers create a formal framework that allows them to formulate human-understandable hypotheses about what the model has learned and test these more precisely.
Pezzelle: “Our method can be applied to any machine learning or deep learning model as long as we can inspect it. Therefore, a model like ChatGPT is not a good candidate because we cannot look into it; we only get its final output. The model has to be open source for our method to work.”
A more unified approach
Cinà and Pezzelle, who come from different academic backgrounds – medical AI and natural language processing (NLP), respectively – have joined forces in order to develop a method that can be applied to various domains. Pezzelle: “Currently, solutions that are proposed in one of these disciplines do not necessarily reach the other field. So our aim is to create a more unified approach.”