Transformers-interpret

Latest version: v0.10.0

Safety actively analyzes 626157 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

0.16164146624851905

('the', 0.5026975657258089),
('new', 0.052589263167955536),
('Mac', 0.2528325960993759),
('book', -0.06445090203729663),
('showing', -0.21204922293777534),
('off', 0.06319714817612732),
('a', 0.032048012090796815),
('range', 0.08553079346908955),
('of', 0.1409201107994034),
('new', 0.0515261917112576),
('features', -0.09656406466213506),
('found', 0.02336613296843605),
('in', -0.0011649894272190678),
('the', 0.14229640664777807),
('proprietary', -0.23169065661847646),
('silicon', 0.5963924257008087),
('chip', -0.19908474233975806),
('computer', 0.030620295844734646),
('.', 0.1995076958535378)]

We can find out which label was predicted with:

python
>>> zero_shot_explainer.predicted_label
'technology (entailment)'

For the `ZeroShotClassificationExplainer` the visualize() method returns a table similar to the `SequenceClassificationExplainer`.

python
zero_shot_explainer.visualize("zero_shot.html")

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/zero_shot_example.png" width="150%" height="150%" align="center" />
</a>

Custom Labels For Sequence Classification - lalitpagaria (25, 41 )

This contribution by lalitpagaria adds the ability to add custom class labels that replace the default labels originally found in the models's config.

This is a very useful addition as it is quite common for popular trained models to have not set the label names resulting in labels that look like `"LABEL_0", "LABEL_1",...`. This can make the sequence classification explainer visualization particularly hard to understand and not very readable.

Custom labels are passed to the `SequenceClassificationExplainers`'s constructor and the number of labels passed must equal the number of already existing labels:

python
seq_explainer = SequenceClassificationExplainer(
DISTILBERT_MODEL, DISTILBERT_TOKENIZER, custom_labels=["sad", "happy"]
)

Now the class at the 0th index corresponds to the label "sad" and the class at the 1st index corresponds to "happy".

This is a really nice addition that makes Transformers Interpret more usable with a whole range sequence classification models that don't have labels set. Thanks lalitpagaria .

General Cleanup and Housekeeping
* Cleaned up a number of flake8 reported linting errors
* Improved on the docstring for QA explainer (more to come however) but the QA explainer still needs some finalization
* Added some increased testing coverage
* Added a contribution guideline

0.10.0

See 107 for the motivation behind fix.

0.9.5

This is a hugely exciting release for us as it is our first foray into the domain of computer vision. With this update, we are adding support for [image classification models inside the Huggingface Transformers ecosystem](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads). We are very excited to bring a simple API for calculating and visualizing attributions for vision transformers and their numerous variants in just 3 lines of code.

ImageClassificationExplainer (105)

The `ImageClassificationExplainer` is designed to work with all models from the Transformers library that are trained for image classification (Swin, ViT etc). It provides attributions for every pixel in that image that can be easily visualized using the explainer's built-in `visualize` method.

Initialising an image classification is very simple, all you need is an image classification model finetuned or trained to work with Huggingface and its feature extractor.

For this example we are using `google/vit-base-patch16-224`, a Vision Transformer (ViT) model pre-trained on ImageNet-21k that predicts from 1000 possible classes.

python
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from transformers_interpret import ImageClassificationExplainer
from PIL import Image
import requests

model_name = "google/vit-base-patch16-224"
model = AutoModelForImageClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

With both the model and feature extractor initialized we are now able to get explanations on an image, we will use a simple image of a golden retriever.
image_link = "https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F47%2F2020%2F08%2F16%2Fgolden-retriever-177213599-2000.jpg"

image = Image.open(requests.get(image_link, stream=True).raw)

image_classification_explainer = ImageClassificationExplainer(model=model, feature_extractor=feature_extractor)

image_attributions = image_classification_explainer(
image
)

print(image_attributions.shape)

Which will return the following list of tuples:

python
>>> torch.Size([1, 3, 224, 224])

Visualizing Image Attributions

Because we are dealing with images visualization is even more straightforward than in text models.

Attributions can be easily visualized using the `visualize` method of the explainer. There are currently 4 supported visualization methods.

- `heatmap` - a heatmap of positive and negative attributions is drawn in using the dimensions of the image.
- `overlay` - the heatmap is overlayed over a grayscaled version of the original image
- `masked_image` - the absolute value of attributions is used to create a mask over the original image
- `alpha_scaling` - Sets the alpha channel (transparency) of each pixel to be equal to the normalized attribution value.

Heatmap

python
image_classification_explainer.visualize(
method="heatmap",
side_by_side=True,
outlier_threshold=0.03

)

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/heatmap_sbs.png" width="100%" height="100%" align="center"/>
</a>

Overlay

python
image_classification_explainer.visualize(
method="overlay",
side_by_side=True,
outlier_threshold=0.03

)

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/overlay_sbs.png" width="100%" height="100%" align="center"/>
</a>

Masked Image

python
image_classification_explainer.visualize(
method="masked_image",
side_by_side=True,
outlier_threshold=0.03

)

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/masked_image_sbs.png" width="100%" height="100%" align="center"/>
</a>

Alpha Scaling

python
image_classification_explainer.visualize(
method="alpha_scaling",
side_by_side=True,
outlier_threshold=0.03

)

<a href="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png">
<img src="https://github.com/cdpierse/transformers-interpret/blob/master/images/vision/alpha_scaling_sbs.png" width="100%" height="100%" align="center"/>

</a>
</details>

0.8.1

Lots of changes big and small with this release:

PairwiseSequenceClassificationExplainer (87, 82, 58)

This has been a fairly requested feature and one that I am very happy to release, especially as I have had the desire to explain the outputs of [CrossEncoder](https://www.sbert.net/docs/pretrained_cross-encoders.html) models as of late.

The `PairwiseSequenceClassificationExplainer` is a variant of the `SequenceClassificationExplainer` that is designed to work with classification models that expect the input sequence to be two inputs separated by a models' separator token. Common examples of this are [NLI models](https://arxiv.org/abs/1705.02364) and [Cross-Encoders ](https://www.sbert.net/docs/pretrained_cross-encoders.html)which are commonly used to score two inputs similarity to one another.

This explainer calculates pairwise attributions for two passed inputs text1 and text2 using the model and tokenizer given in the constructor.

Also, since a common use case for pairwise sequence classification is to compare two inputs similarity - models of this nature typically only have a single output node rather than multiple for each class. The pairwise sequence classification has some useful utility functions to make interpreting single node outputs clearer.

By default for models that output a single node the attributions are with respect to the inputs pushing the scores closer to 1.0, however if you want to see the attributions with respect to scores closer to 0.0 you can pass `flip_sign=True` when calling the explainer. For similarity-based models, this is useful, as the model might predict a score closer to 0.0 for the two inputs and in that case, we would flip the attributions sign to explain why the two inputs are dissimilar.

Example Usage
For this example we are using `"cross-encoder/ms-marco-MiniLM-L-6-v2"`, a high quality cross-encoder trained on the [MSMarco dataset](https://github.com/microsoft/MSMARCO-Passage-Ranking) a passage ranking dataset for question answering and machine reading comprehension.

python
from transformers import AutoModelForSequenceClassification, AutoTokenizer

from transformers_interpret.explainers.sequence_classification import PairwiseSequenceClassificationExplainer

model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2")

pairwise_explainer = PairwiseSequenceClassificationExplainer(model, tokenizer)

the pairwise explainer requires two string inputs to be passed, in this case given the nature of the model
we pass a query string and a context string. The question we are asking of our model is "does this context contain a valid answer to our question"
the higher the score the better the fit.

query = "How many people live in Berlin?"
context = "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."
pairwise_attr = explainer(query, context)

Which returns the following attributions:

python
>>> pairwise_attr

0.7.2

TokenClassificationExplainer(91)

This incredible release is all thanks to a fantastic community contribution from pabvald, he implemented the entire `TokenClassificationExplainer` class, as well as all its tests and associated docs. A huge thank you again to Pablo for this amazing work, it has been on my to-do list for over a year and I greatly appreciate this contribution and I know the community will too.

This new explainer is designed to work with any and all models in the HuggingFaceTransformers package that are of the kind `{Model}ForTokenClassification`, which are models commonly used for tasks such Named Entity Recognition (NER) and Part-of-speech (POS) tagging.

The `TokenClassificationExplainer` returns a dictionary mapping each word in a given sequence to a label in the model's trained labels configuration. Token classification models work on a word by word basis so the structure of this explainers output is that each word maps to another dictionary which contains two keys `label` and `attribution_scores`, where `label` is a string indicating the predicted label and `attribution_scores` is another dict mapping words to scores for the given root word key.

How to use

python
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers_interpret import TokenClassificationExplainer

MODEL_PATH = 'dslim/bert-base-NER'
model = AutoModelForTokenClassification.from_pretrained(MODEL_PATH)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

ner_explainer = TokenClassificationExplainer(model=model, tokenizer=tokenizer)

sample_text = "Tim Cook is CEO of Apple."
attributions = ner_explainer(sample_text)

print(attributions)

<details><summary>Expand to see word attribution dictionary</summary>

python
{'[CLS]': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.346423320984119),
('Cook', 0.5334609978768102),
('is', -0.40334870049983335),
('CEO', -0.3101234375976895),
('of', 0.512072192130804),
('Apple', -0.17249370683345489),
('.', 0.21111967418861474),
('[SEP]', 0.0)]},
'Tim': {'label': 'B-PER',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.6097200124017794),
('Cook', 0.7418433507979225),
('is', 0.2277328676307869),
('CEO', 0.12913824237676577),
('of', 0.0658425121482477),
('Apple', 0.06830320263790929),
('.', -0.01924683905463743),
('[SEP]', 0.0)]},
'Cook': {'label': 'I-PER',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.5523936725613293),
('Cook', 0.8009957951991128),
('is', 0.1804967026709793),
('CEO', 0.12327788007775593),
('of', 0.042470529981614845),
('Apple', 0.057217721910403266),
('.', -0.020318897077615642),
('[SEP]', 0.0)]},
'is': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.24614651317657982),
('Cook', -0.009088703281476993),
('is', 0.9216954069405697),
('CEO', 0.026992140219729874),
('of', 0.2520559406534854),
('Apple', -0.09920548911190433),
('.', 0.12531705560714215),
('[SEP]', 0.0)]},
'CEO': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.3124910273039106),
('Cook', 0.3625517589427658),
('is', 0.3507524148134499),
('CEO', 0.37196988201878567),
('of', 0.645668212957734),
('Apple', -0.27458958091134866),
('.', 0.13126252757894524),
('[SEP]', 0.0)]},
'of': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.021065140560775575),
('Cook', 0.05638048932919909),
('is', 0.16774739397504396),
('CEO', 0.043009122581603866),
('of', 0.9340829137500298),
('Apple', -0.11144488868920191),
('.', 0.2854079089492836),
('[SEP]', 0.0)]},
'Apple': {'label': 'B-ORG',
'attribution_scores': [('[CLS]', 0.0),
('Tim', -0.017330599088927878),
('Cook', -0.04074196463435918),
('is', -0.08738080703156076),
('CEO', 0.23234519803002726),
('of', 0.12270125701886334),
('Apple', 0.9561624229708163),
('.', -0.08436746169241069),
('[SEP]', 0.0)]},
'.': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.052863660537099254),
('Cook', -0.0694824371223385),
('is', -0.18074653059003534),
('CEO', 0.021118463602210605),
('of', 0.06322422431822372),
('Apple', -0.6286955666244136),
('.', 0.748336093254276),
('[SEP]', 0.0)]},
'[SEP]': {'label': 'O',
'attribution_scores': [('[CLS]', 0.0),
('Tim', 0.29980967625881066),
('Cook', -0.22297477338851293),
('is', -0.050889312336460345),
('CEO', 0.11157068443843984),
('of', 0.25200059104116196),
('Apple', -0.8839047143031845),
('.', -0.023808126035021283),
('[SEP]', 0.0)]}}

</details>

Visualizing explanations

With a single call to the visualize() method we get a nice inline display of what inputs are causing the activations to fire that led to classifying each of the tokens into a particular class.

<img width="760" alt="Screenshot 2022-06-18 at 17 47 39" src="https://user-images.githubusercontent.com/8831892/174448562-f1230267-5e95-46f5-b9e8-d63c23692441.png">

Ignore indexes

To save computation time, we can indicate a list of token indexes that we want to ignore. The explainer will not compute explanations for these tokens, although attributions of these tokens will be calculated to explain the predictions over other tokens.

python
attributions_2 = ner_explainer(sample_text, ignored_indexes=[0, 3, 4, 5])

When we visualize these attributions it will be much more concise:

<img width="761" alt="Screenshot 2022-06-18 at 17 52 09" src="https://user-images.githubusercontent.com/8831892/174448695-c547cfbc-dc3c-4e61-ab6f-5a29b6ec23e7.png">

Ignore labels

In a similar way, we can also tell the explainer to ignore certain labels, e.g. we might not be interested in seeing the explanations of those tokens that are classified as 'O'.

python
attributions_3 = ner_explainer(sample_text, ignored_labels=['O'])

Which result in:

<img width="759" alt="Screenshot 2022-06-18 at 17 53 53" src="https://user-images.githubusercontent.com/8831892/174448728-56e659b9-b192-4556-8440-d71095e3e81a.png">

0.6.0

MultiLabelClassificationExplainer (79)

Extends the existing sequence classification explainer into a new explainer that independently produces attributions for each label in the model regardless of what the predicted class is. This allows users to better inspect and interpret model predictions across all classes, particularly in situations where classifiers might be used in a multilabel fashion.

The MultiLabelClassificationExplainer returns a dictionary mapping labels/classes to a list of word attributions, additionally the visualize() method will display the entire table of attributions for each label.

This has been a very requested feature for a number of months so we're very happy to get it released (finally)

CC: MichalMalyska rhettdsouza13 fraserprice JensVN98 dheerajiiitv

How to use

This explainer is an extension of the `SequenceClassificationExplainer` and is thus compatible with all sequence classification models from the Transformers package. The key change in this explainer is that it caclulates attributions for each label in the model's config and returns a dictionary of word attributions w.r.t to each label. The `visualize()` method also displays a table of attributions with attributions calculated per label.

python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers_interpret import MultiLabelClassificationExplainer

model_name = "j-hartmann/emotion-english-distilroberta-base"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

cls_explainer = MultiLabelClassificationExplainer(model, tokenizer)

word_attributions = cls_explainer("There were many aspects of the film I liked, but it was frightening and gross in parts. My parents hated it.")

This produces a dictionary of word attributions mapping labels to a list of tuples for each word and it's attribution score.
<details><summary>Click to see word attribution dictionary</summary>

python
>>> word_attributions

Page 1 of 3

Releases

Has known vulnerabilities

Transformers-interpret

Page 1 of 3

0.16164146624851905

0.10.0

0.9.5

0.8.1

0.7.2

0.6.0

Page 1 of 3

Links

Releases