File size: 3,346 Bytes
d47b5a6
72ac970
 
9309342
72ac970
 
 
1f331a4
d47b5a6
72ac970
d47b5a6
 
 
 
72ac970
 
 
 
3424411
 
72ac970
1f331a4
72ac970
 
 
3424411
 
 
72ac970
 
 
1f331a4
72ac970
 
 
1f331a4
 
 
 
 
 
 
 
 
 
 
 
 
 
72ac970
 
 
3424411
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f331a4
 
 
 
 
 
 
 
 
 
 
 
3424411
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: ECE
datasets:
- "null"
tags:
- evaluate
- metric
description: "Expected calibration error (ECE)"
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
---

# Metric Card for ECE

## Metric Description

This metrics computes the expected calibration error (ECE). ECE evaluates how well a model is calibrated, i.e. how well its output probabilities match the actual ground truth distribution. It measures the $$L^p$$ norm difference between a model’s posterior and the true likelihood of being correct.
This module directly calls the [torchmetrics package implementation](https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html), allowing to use its flexible arguments.

## How to Use

### Inputs
*List all input arguments in the format below*
- **predictions** *(float32): predictions (after softmax). They must have a shape (N,C) if multiclass, or (N,...) if binary;*
- **references** *(int64): reference for each prediction, with a shape (N,...);*
- **kwargs** *arguments to pass to the [ece](https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html) methods.*

### Output Values

ECE as float.

### Examples

```Python
ce = evaluate.load("Natooz/ece")
results = ece.compute(
    references=np.array([[0.25, 0.20, 0.55],
                         [0.55, 0.05, 0.40],
                         [0.10, 0.30, 0.60],
                         [0.90, 0.05, 0.05]]),
    predictions=np.array(),
    num_classes=3,
    n_bins=3,
    norm="l1",
)
print(results)
```

## Citation

```bibtex
@InProceedings{pmlr-v70-guo17a,
      title = 	 {On Calibration of Modern Neural Networks},
      author =       {Chuan Guo and Geoff Pleiss and Yu Sun and Kilian Q. Weinberger},
      booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
      pages = 	 {1321--1330},
      year = 	 {2017},
      editor = 	 {Precup, Doina and Teh, Yee Whye},
      volume = 	 {70},
      series = 	 {Proceedings of Machine Learning Research},
      month = 	 {06--11 Aug},
      publisher =    {PMLR},
      pdf = 	 {http://proceedings.mlr.press/v70/guo17a/guo17a.pdf},
      url = 	 {https://proceedings.mlr.press/v70/guo17a.html},
}

```

```bibtex
@inproceedings{NEURIPS2019_f8c0c968,
     author = {Kumar, Ananya and Liang, Percy S and Ma, Tengyu},
     booktitle = {Advances in Neural Information Processing Systems},
     editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
     publisher = {Curran Associates, Inc.},
     title = {Verified Uncertainty Calibration},
     url = {https://papers.nips.cc/paper_files/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html},
     volume = {32},
     year = {2019}
}
```

```bibtex
@InProceedings{Nixon_2019_CVPR_Workshops,
    author = {Nixon, Jeremy and Dusenberry, Michael W. and Zhang, Linchuan and Jerfel, Ghassen and Tran, Dustin},
    title = {Measuring Calibration in Deep Learning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2019},
    url = {https://openaccess.thecvf.com/content_CVPRW_2019/html/Uncertainty_and_Robustness_in_Deep_Visual_Learning/Nixon_Measuring_Calibration_in_Deep_Learning_CVPRW_2019_paper.html},
}
```