The v0.8.0 release of Captum offers new influence functions for data attribution, improvements to feature attribution methods (including LLM prompt attribution), enhanced type annotations for modern Python type checking, and a variety of other small changes. Note that support for Python 3.8 and PyTorch 1.10 have been dropped, and Captum Insights will be deprecated next major release.
Data Attribution: New Influence Functions
This version offers two different implementations that both calculate the "infinitesimal" influence score as defined in the paper[ "Understanding Black-box Predictions via Influence Functions"](https://arxiv.org/pdf/1703.04730.pdf).
- `NaiveInfluenceFunction`: a computationally slow but exact implementation that is useful for obtaining "ground-truth" (though, note that influence scores themselves are an approximation of the effect of removing then retraining). Several papers actually use this approach, i.e.[ "Learning Augmentation Network via Influence Functions"](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lee_Learning_Augmentation_Network_via_Influence_Functions_CVPR_2020_paper.pdf),[ "Quantifying and Mitigating the Impact of Label Errors on Model Disparity Metrics"](https://openreview.net/forum?id=RUzSobdYy0V),[ "Achieving Fairness at No Utility Cost via Data Reweighting with Influence"](https://proceedings.mlr.press/v162/li22p/li22p.pdf) (PR https://github.com/pytorch/captum/pull/1214)
- `ArnoldiInfluenceFunction`: This is a computationally efficient implementation described in the paper[ "Scaling Up Influence Functions"](https://l.facebook.com/l.php?u=https%3A%2F%2Farxiv.org%2Fpdf%2F2112.03052.pdf&h=AT0QVmLyIXhxubw9fqxG6ULdy-lyAHPchGhHbXAwbM3GU1zQEUm1XPAk5ymiw11nONY4mWzJg10CSYlX5R3VJ5Ty7Y-WkawSfnSsWpaJLVPP_k2RpiSNNT80DeP9qS_3yT2a9Y3gb2ZTBhFCLipcnIyLOwWN) by Schioppa et al. (PR https://github.com/pytorch/captum/pull/1187)
Example:
python
from captum.influence._core.influence_function import NaiveInfluenceFunction
from torch import nn
from torch.utils.data import DataLoader
train_dl = DataLoader(your_dataset, batch_size=8) your dataloader
criterion = nn.MSELoss(reduction="none")
influence = NaiveInfluenceFunction(
net,
train_dl,
checkpoint_path, path to your model checkpoint
loss_fn=criterion,
batch_size=batch_size,
)
compute pairwise influences using influence implementation
influence_train_test_influences = influence.influence(
(test_samples, test_labels) your test data (Tensors)
)
What is the "infinitesimal" influence score
More details on the "infinitesimal" influence score: This "infinitesimal" influence score approximately answers the question if a given training example were infinitesimally down-weighted and the model re-trained to optimality, how much would the loss on a given test example change. Mathematically, the aforementioned influence score is given by $\nabla_\theta L(x)' H^{-1} \nabla_\theta L(z)$, where $\nabla_\theta L(x)$ is the gradient of the loss, considering only training example x with respect to (a subset of) model parameters $\theta$, $\nabla_\theta L(z)$ is the analogous quantity for a test example $z$, and $H$ is the Hessian of the (subset of) model parameters at a given model checkpoint.
What the two implementations have in common
Both implementations compute a low-rank approximation of the inverse Hessian, i.e. a tall and skinny (with width k) matrix $R$ such that $H^{-1} \approx RR'$, where $k$ is small. In particular, let $L$ be the matrix of width $k$ whose columns contain the top-k eigenvectors of $H$, and let $V$ be the $k$ by $k$ matrix whose diagonals contain the corresponding eigenvalues. Both implementations let $R=LV^{-1}L'$. Thus, the core computational step is computing the top-k eigenvalues / eigenvectors.
This approximation is useful for several reasons:
- It avoids numerical issues associated with inverting small eigenvalues
- Since the influence score is given by $\nabla_\theta L(x)' H^{-1} \nabla_\theta L(z)$, which is approximated by $(\nabla_\theta L(x)' R) (\nabla_\theta L(z)' R)$, we can compute an "influence embedding" for a given example x, $\nabla_\theta L(x)' R$, such that the influence score of one example on another is approximately the dot-product of their respective embeddings. Because k is small, i.e. 50, these influence embeddings are low-dimensional.
- Even for large models, we can store $R$ in memory, provided k is small. This means influence embeddings (and thus influence scores) can be efficiently computed by doing a backwards pass to compute $\nabla_\theta L(x)$ and then multiplying by $R'$. This is orders of magnitude faster than the previous LISSA approach of Koh et al, which to compute the influence score involving a given example, need to compute Hessian-vector products involving on the order of 10^4 examples.
The implementations differ in how they compute the top-k eigenvalues / eigenvectors.
How NaiveInfluenceFunction computes the top-k eigenvalues / eigenvectors
It is "naive" in that it computes the top-k eigenvalues / eigenvectors by explicitly forming the Hessian, converting it to a 2D tensor, computing its eigenvectors / eigenvalues, and then sorting. See documentation of the `_set_projections_naive_influence_function` method for more details.
How ArnoldiInfluenceFunction computes the top-k eigenvalues / eigenvectors
The key novelty of the approach by Schioppa et al. is that it uses the Arnoldi iteration to find the top-k eigenvalues / eigenvectors of the Hessian without explicitly forming the Hessian. In more detail, the approach first runs the Arnoldi iteration, which only requires the ability to compute Hessian-vector products, to find a Krylov subspace of moderate dimension, i.e. 200. It then finds the top-k eigenvalues / eigenvectors of the restriction of the Hessian to the subspace, where k is small, i.e. 50. Finally, it expresses the eigenvectors in the original basis. This approach for finding the top-k eigenvalues / eigenvectors is justified by the property of the Arnoldi iteration, that the Krylov subspace it returns tends to contain the top eigenvectors.
This implementation does incur some one-time overhead in `__init__`, where it runs the Arnoldi iteration to calculate $R$. After that overhead, calculation of influence scores is quick, only requiring a backwards pass and multiplication, per example.
Unlike `NaiveInfluenceFunction`, this implementation does not flatten any parameters, as the 2D Hessian is never formed, and Pytorch's Hessian-vector implementation (`torch.autograd.functional.hvp`) allows the input and output vector to be a tuple of tensors. Avoiding flattening / unflattening parameters brings scalability gains.
Feature Attribution Improvements
- Added initial support for asynchronous attribution (PyTorch [futures](https://pytorch.org/docs/stable/futures.html)) for the following methods (PRs https://github.com/pytorch/captum/pull/1295, https://github.com/pytorch/captum/pull/1316, https://github.com/pytorch/captum/pull/1317, https://github.com/pytorch/captum/pull/1314, https://github.com/pytorch/captum/pull/1320, https://github.com/pytorch/captum/pull/1326, https://github.com/pytorch/captum/pull/1335, https://github.com/pytorch/captum/pull/1487):
- FeatureAblation
- FeaturePermutation
- ShapleyValueSampling
- ShapleyValues
- Added support for additional gradient-based LLM attribution methods (PRs https://github.com/pytorch/captum/pull/1337, https://github.com/pytorch/captum/pull/1420):
- LayerGradientXActivation
- LayerGradientShap
- Added support to perturbation-based LLM attribution for “key and value” [caching](https://huggingface.co/docs/transformers/main/en/kv_cache) (PRs https://github.com/pytorch/captum/pull/1224, https://github.com/pytorch/captum/pull/1341, https://github.com/pytorch/captum/pull/1343, https://github.com/pytorch/captum/pull/1353)
- Added support to pass gradient keyword arguments to the following Captum.attr methods through grad_kwargs (PRs https://github.com/pytorch/captum/pull/1286, https://github.com/pytorch/captum/pull/1294, https://github.com/pytorch/captum/pull/1435):
- LayerGradCam
- InternalInfluence
- LayerConductance
- LayerDeepLift
- LayerGradientShap
- NeuronConductance
- LayerGradientXActivation
- LayerIntegratedGradients
- Added a tutorial for perturbation- and gradient-based LLM attribution (tutorials/Llama2_LLM_Attribution.ipynb) (PRs https://github.com/pytorch/captum/pull/1228, https://github.com/pytorch/captum/pull/1333, https://github.com/pytorch/captum/pull/1445)

Changes to Requirements
- We have dropped support for Python < 3.8 and PyTorch < 1.10 (PRs https://github.com/pytorch/captum/pull/1460, https://github.com/pytorch/captum/pull/1298, https://github.com/pytorch/captum/pull/1305)
- We plan to deprecate Captum Insights in the next major release (PR https://github.com/pytorch/captum/pull/1498)
Improvements to Type Annotations
Greatly improved typing throughout the library, now supporting and complying with the latest versions of both pyre and mypy type checking (PRs https://github.com/pytorch/captum/pull/1371, https://github.com/pytorch/captum/pull/1318, https://github.com/pytorch/captum/pull/1319, https://github.com/pytorch/captum/pull/1324, https://github.com/pytorch/captum/pull/1247, https://github.com/pytorch/captum/pull/1270, https://github.com/pytorch/captum/pull/1299, https://github.com/pytorch/captum/pull/1330, https://github.com/pytorch/captum/pull/1356, https://github.com/pytorch/captum/pull/1359, https://github.com/pytorch/captum/pull/1377, https://github.com/pytorch/captum/pull/1389, https://github.com/pytorch/captum/pull/1381, https://github.com/pytorch/captum/pull/1382, https://github.com/pytorch/captum/pull/1383, https://github.com/pytorch/captum/pull/1406, https://github.com/pytorch/captum/pull/1405, https://github.com/pytorch/captum/pull/1404, https://github.com/pytorch/captum/pull/1403, https://github.com/pytorch/captum/pull/1402, https://github.com/pytorch/captum/pull/1401, https://github.com/pytorch/captum/pull/1400, https://github.com/pytorch/captum/pull/1399, https://github.com/pytorch/captum/pull/1398, https://github.com/pytorch/captum/pull/1397, https://github.com/pytorch/captum/pull/1396, https://github.com/pytorch/captum/pull/1395, https://github.com/pytorch/captum/pull/1394, https://github.com/pytorch/captum/pull/1393, https://github.com/pytorch/captum/pull/1392, https://github.com/pytorch/captum/pull/1391, https://github.com/pytorch/captum/pull/1390, https://github.com/pytorch/captum/pull/1385, https://github.com/pytorch/captum/pull/1412, https://github.com/pytorch/captum/pull/1409, https://github.com/pytorch/captum/pull/1411, https://github.com/pytorch/captum/pull/1418, https://github.com/pytorch/captum/pull/1416, https://github.com/pytorch/captum/pull/1415, https://github.com/pytorch/captum/pull/1414, https://github.com/pytorch/captum/pull/1421, https://github.com/pytorch/captum/pull/1424, https://github.com/pytorch/captum/pull/1365, https://github.com/pytorch/captum/pull/1427, https://github.com/pytorch/captum/pull/1425, https://github.com/pytorch/captum/pull/1428, https://github.com/pytorch/captum/pull/1433, https://github.com/pytorch/captum/pull/1434, https://github.com/pytorch/captum/pull/1431, https://github.com/pytorch/captum/pull/1437, https://github.com/pytorch/captum/pull/1438, https://github.com/pytorch/captum/pull/1439, https://github.com/pytorch/captum/pull/1441, https://github.com/pytorch/captum/pull/1448, https://github.com/pytorch/captum/pull/1453, https://github.com/pytorch/captum/pull/1455, https://github.com/pytorch/captum/pull/1459, https://github.com/pytorch/captum/pull/1457, https://github.com/pytorch/captum/pull/1458, https://github.com/pytorch/captum/pull/1461, https://github.com/pytorch/captum/pull/1462, https://github.com/pytorch/captum/pull/1463, https://github.com/pytorch/captum/pull/1464, https://github.com/pytorch/captum/pull/1465, https://github.com/pytorch/captum/pull/1466, https://github.com/pytorch/captum/pull/1467, https://github.com/pytorch/captum/pull/1469, https://github.com/pytorch/captum/pull/1470, https://github.com/pytorch/captum/pull/1471, https://github.com/pytorch/captum/pull/1472, https://github.com/pytorch/captum/pull/1474, https://github.com/pytorch/captum/pull/1475, https://github.com/pytorch/captum/pull/1476, https://github.com/pytorch/captum/pull/1477, https://github.com/pytorch/captum/pull/1479, https://github.com/pytorch/captum/pull/1480, https://github.com/pytorch/captum/pull/1481, https://github.com/pytorch/captum/pull/1482, https://github.com/pytorch/captum/pull/1503, https://github.com/pytorch/captum/pull/1502)
Minor Changes and Fixes
- Added a fix to IntegratedGradients to fully support the MPS backend (PR https://github.com/pytorch/captum/pull/1227)
- Added support for the latest version of the black code formatter (PR https://github.com/pytorch/captum/pull/1241)
- Improved the test case coverage, logic, stability, and speed across Captum, especially for layer-based attribution methods, LLM attribution, and captum.influence methods and utilities (PRs https://github.com/pytorch/captum/pull/1250, https://github.com/pytorch/captum/pull/1251, https://github.com/pytorch/captum/pull/1253, https://github.com/pytorch/captum/pull/1258, https://github.com/pytorch/captum/pull/1243, https://github.com/pytorch/captum/pull/1249, https://github.com/pytorch/captum/pull/1252, https://github.com/pytorch/captum/pull/1259, https://github.com/pytorch/captum/pull/1260, https://github.com/pytorch/captum/pull/1262, https://github.com/pytorch/captum/pull/1264, https://github.com/pytorch/captum/pull/1265, https://github.com/pytorch/captum/pull/1272, https://github.com/pytorch/captum/pull/1300, https://github.com/pytorch/captum/pull/1301, https://github.com/pytorch/captum/pull/1302, https://github.com/pytorch/captum/pull/1323, https://github.com/pytorch/captum/pull/1352, https://github.com/pytorch/captum/pull/1362, https://github.com/pytorch/captum/pull/1364, https://github.com/pytorch/captum/pull/1388, https://github.com/pytorch/captum/pull/1408, https://github.com/pytorch/captum/pull/1410, https://github.com/pytorch/captum/pull/1419, https://github.com/pytorch/captum/pull/1422, https://github.com/pytorch/captum/pull/1436, https://github.com/pytorch/captum/pull/1454, https://github.com/pytorch/captum/pull/1484, https://github.com/pytorch/captum/pull/1485, https://github.com/pytorch/captum/pull/1492)
- Improved LLM attribution plotting aesthetics and text readability (PRs https://github.com/pytorch/captum/pull/1348, https://github.com/pytorch/captum/pull/1349, https://github.com/pytorch/captum/pull/1351, https://github.com/pytorch/captum/pull/1354, https://github.com/pytorch/captum/pull/1355, https://github.com/pytorch/captum/pull/1360, https://github.com/pytorch/captum/pull/1417)
- Free autograd graphs in between LLM attribution calls (PR https://github.com/pytorch/captum/pull/1347)
- Fixed data type bug with the titanic tutorial (tutorials/Titanic_Basic_Interpret.ipynb) (PR https://github.com/pytorch/captum/pull/1331)
- Fixed multiple device-related bugs for feature ablation/permutation masks and LLM attribution (PR https://github.com/pytorch/captum/pull/1245, https://github.com/pytorch/captum/pull/1307)
- Reduced the complexity of various functions throughout Captum (PRs https://github.com/pytorch/captum/pull/1368, https://github.com/pytorch/captum/pull/1372, https://github.com/pytorch/captum/pull/1369, https://github.com/pytorch/captum/pull/1370, https://github.com/pytorch/captum/pull/1374, https://github.com/pytorch/captum/pull/1375, https://github.com/pytorch/captum/pull/1376, https://github.com/pytorch/captum/pull/1378, https://github.com/pytorch/captum/pull/1380, https://github.com/pytorch/captum/pull/1384, https://github.com/pytorch/captum/pull/1407)
- Fixed a bug in the tutorial parsing script (PR https://github.com/pytorch/captum/pull/1268)