Modern Deep Neural Networks (DNNs) are intrinsically opaque, making it challenging to comprehend their decision-making processes. This lack of transparency interferes with the extensive implementation of Machine Learning techniques in a variety of domains. In response, Explainable AI (XAI) has surfaced intending to improve human comprehension of DNN judgments. By looking at the functional purpose of each component, XAI now also focuses on understanding the global behavior of DNNs, in contrast to local explanation methods that employ saliency maps to explain individual predictions.
Mechanistic interpretability is a global explainability technique that focuses on pinpointing the particular ideas that neurons, the fundamental computing units of a neural network, have acquired the ability to understand. We can explain the operation of a network's latent representations by labeling neurons with descriptions that are comprehensible to humans. These descriptions have progressed from simple labels to elaborate, compositional, and open-vocabulary explanations. Nevertheless, the lack of standardized quantitative metrics for assessing these open-vocabulary descriptions has hindered thorough comparisons across various approaches.
To address this gap, researchers from ATB Potsdam, University of Potsdam, TU Berlin, Fraunhofer Heinrich-Hertz-Institute, and BIFOLD introduce CoSy, a pioneering quantitative evaluation framework for assessing the use of open-vocabulary explanations in computer vision (CV) models. This innovative method makes use of developments in Generative AI to produce artificial images that correlate to concept-based textual descriptions. CoSy eliminates the need for human interaction by facilitating quantitative comparisons of different concept-based textual explanation approaches by synthesizing data points common for particular target explanations.
The research team demonstrated through an extensive meta-analysis that CoSy offers reliable evaluations of explanations. The study discovered that concept-based textual explanation methods function best in the upper layers of neural networks, where high-level concepts are learned. High-quality neuron explanations are produced by methods such as INVERT, which creates visuals from neural network representations, and CLIP-Dissect, which examines internal network representations. Conversely, techniques such as MILAN and FALCON produce explanations of inferior quality, occasionally providing almost random concepts, which may result in inaccurate network conclusions.
A major drawback of CoSy, as the researchers recognize, is that specific categories from the training data might not have been included in the generative model, leading to explanations that are too general or ambiguous, such as "white objects." Generative accuracy might be increased by addressing this problem by examining pre-training datasets and model performance. Even yet, CoSy exhibits much potential in the still-developing field of assessing non-local explanation techniques.
Looking ahead, the team is hopeful about CoSy's potential applications across multiple fields. To evaluate the plausibility or quality of an explanation concerning the result of a downstream task, human judgment must be included in the definition of explanation quality, which they want to address in future work. Additionally, they want to include additional fields like natural language processing and healthcare in their evaluation system. The potential use of CoSy for assessing large, opaque, autointerpretable language models (LLMs) is especially exciting. According to the researchers, applying CoSy to healthcare datasets, where explanation quality is important, could be a big step forward. These potential future applications of CoSy have enormous potential to advance AI research.
Become a Coding Pro at Your Own Pace! Join Code Labs Academy’s Online Part-Time Bootcamp and upskill in coding.