Concepedia

Publication | Open Access

Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

48

Citations

17

References

2022

Year

TLDR

Saliency methods are commonly used to identify important input features in neural networks, but interpreting them requires tedious manual inspection that often leads to ad hoc or cherry‑picked analyses. This work introduces Shared Interest, a set of metrics that compare model reasoning via saliency to human reasoning via ground‑truth annotations. Shared Interest provides quantitative descriptors that enable ranking, sorting, and aggregation of inputs, facilitating large‑scale systematic analysis of model behavior. Applying Shared Interest revealed eight recurring patterns in model behavior and showed that it can help determine model trustworthiness, uncover issues missed by manual analysis, and support interactive probing with real‑world users.

Abstract

Saliency methods — techniques to identify the importance of input features on a model's output — are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for comparing model reasoning (via saliency) to human reasoning (via ground truth annotations). By providing quantitative descriptors, Shared Interest enables ranking, sorting, and aggregating inputs, thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior, such as cases where contextual features or a subset of ground truth features are most important to the model. Working with representative real-world users, we show how Shared Interest can be used to decide if a model is trustworthy, uncover issues missed in manual analyses, and enable interactive probing.

References

YearCitations

Page 1