Spaces:

Natooz
/

ece

Runtime error

App Files Files Community

Natooz commited on Jun 27, 2023

Commit

1f331a4

•

1 Parent(s): 72ac970

upload

Browse files

Files changed (9) hide show

.idea/.gitignore +3 -0
.idea/ece.iml +12 -0
.idea/inspectionProfiles/Project_Default.xml +14 -0
.idea/inspectionProfiles/profiles_settings.xml +6 -0
.idea/misc.xml +4 -0
.idea/modules.xml +8 -0
.idea/vcs.xml +6 -0
README.md +34 -20
ece.py +61 -40

.idea/.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+# Default ignored files
+/shelf/
+/workspace.xml

.idea/ece.iml ADDED Viewed

	@@ -0,0 +1,12 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<module type="PYTHON_MODULE" version="4">
+  <component name="NewModuleRootManager">
+    <content url="file://$MODULE_DIR$" />
+    <orderEntry type="jdk" jdkName="Python 3.8 (venv) (6)" jdkType="Python SDK" />
+    <orderEntry type="sourceFolder" forTests="false" />
+  </component>
+  <component name="PyDocumentationSettings">
+    <option name="format" value="PLAIN" />
+    <option name="myDocStringFormat" value="Plain" />
+  </component>
+</module>

.idea/inspectionProfiles/Project_Default.xml ADDED Viewed

	@@ -0,0 +1,14 @@

+<component name="InspectionProjectProfileManager">
+  <profile version="1.0">
+    <option name="myName" value="Project Default" />
+    <inspection_tool class="PyPackageRequirementsInspection" enabled="true" level="WARNING" enabled_by_default="true">
+      <option name="ignoredPackages">
+        <value>
+          <list size="1">
+            <item index="0" class="java.lang.String" itemvalue="torch" />
+          </list>
+        </value>
+      </option>
+    </inspection_tool>
+  </profile>
+</component>

.idea/inspectionProfiles/profiles_settings.xml ADDED Viewed

	@@ -0,0 +1,6 @@

+<component name="InspectionProjectProfileManager">
+  <settings>
+    <option name="USE_PROJECT_PROFILE" value="false" />
+    <version value="1.0" />
+  </settings>
+</component>

.idea/misc.xml ADDED Viewed

	@@ -0,0 +1,4 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.8 (venv) (6)" project-jdk-type="Python SDK" />
+</project>

.idea/modules.xml ADDED Viewed

	@@ -0,0 +1,8 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="ProjectModuleManager">
+    <modules>
+      <module fileurl="file://$PROJECT_DIR$/.idea/ece.iml" filepath="$PROJECT_DIR$/.idea/ece.iml" />
+    </modules>
+  </component>
+</project>

.idea/vcs.xml ADDED Viewed

	@@ -0,0 +1,6 @@

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="VcsDirectoryMappings">
+    <mapping directory="" vcs="Git" />
+  </component>
+</project>

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 tags:
 - evaluate
 - metric
-description: "TODO: add a description here"
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
@@ -14,37 +14,51 @@ pinned: false
 # Metric Card for ECE
-***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 ## Metric Description
-*Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
-## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
 ### Inputs
 *List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
-#### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
 ### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
-## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
-*Cite the source where this metric was introduced.*
-## Further References
-*Add any useful further references.*

 tags:
 - evaluate
 - metric
+description: "Expected calibration error (ECE)"
 sdk: gradio
 sdk_version: 3.19.1
 app_file: app.py
 # Metric Card for ECE
 ## Metric Description
+This metrics computes the expected calibration error (ECE).
+It directly calls the torchmetrics package:
+https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html
+## How to Use
 ### Inputs
 *List all input arguments in the format below*
+- **input_field** *(tensor or numpy array, float32): predictions (after softmax). They must have a shape (N,C,...) if multiclass, or (N,...) if binary.*
+- **references** *(tensor or numpy array, int64): reference for each prediction, with a shape (N,...).*
 ### Output Values
+ECE as float.
 ### Examples
+```Python
+ce = evaluate.load("Natooz/ece")
+results = ece.compute(
+    references=np.array([[0.25, 0.20, 0.55],
+                         [0.55, 0.05, 0.40],
+                         [0.10, 0.30, 0.60],
+                         [0.90, 0.05, 0.05]]),
+    predictions=np.array(),
+    num_classes=3,
+    n_bins=3,
+    norm="l1",
+)
+print(results)
+```
 ## Citation
+```bibtex
+@inproceedings{NEURIPS2019_f8c0c968,
+     author = {Kumar, Ananya and Liang, Percy S and Ma, Tengyu},
+     booktitle = {Advances in Neural Information Processing Systems},
+     editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
+     publisher = {Curran Associates, Inc.},
+     title = {Verified Uncertainty Calibration},
+     url = {https://papers.nips.cc/paper_files/paper/2019/hash/f8c0c968632845cd133308b1a494967f-Abstract.html},
+     volume = {32},
+     year = {2019}
+}
+```

ece.py CHANGED Viewed

@@ -11,58 +11,63 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-"""TODO: Add a description here."""
 import evaluate
 import datasets
-# TODO: Add BibTeX citation
 _CITATION = """\
-@InProceedings{huggingface:module,
-title = {A great new module},
-authors={huggingface, Inc.},
-year={2020}
 }
 """
-# TODO: Add description of the module here
 _DESCRIPTION = """\
-This new module is designed to solve this great ML task and is crafted with a lot of care.
 """
-# TODO: Add description of the arguments of the module here
 _KWARGS_DESCRIPTION = """
 Calculates how good are predictions given some references, using certain scores
 Args:
-    predictions: list of predictions to score. Each predictions
-        should be a string with tokens separated by spaces.
-    references: list of reference for each prediction. Each
-        reference should be a string with tokens separated by spaces.
 Returns:
-    accuracy: description of the first score,
-    another_score: description of the second score,
 Examples:
-    Examples should be written in doctest format, and should illustrate how
-    to use the function.
-    >>> my_new_module = evaluate.load("my_new_module")
-    >>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
     >>> print(results)
-    {'accuracy': 1.0}
 """
-# TODO: Define external resources urls if needed
-BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
 @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
 class ECE(evaluate.Metric):
-    """TODO: Short description of my evaluation module."""
     def _info(self):
-        # TODO: Specifies the evaluate.EvaluationModuleInfo object
         return evaluate.MetricInfo(
             # This is the description that will appear on the modules page.
             module_type="metric",
@@ -71,25 +76,41 @@ class ECE(evaluate.Metric):
             inputs_description=_KWARGS_DESCRIPTION,
             # This defines the format of each prediction and reference
             features=datasets.Features({
-                'predictions': datasets.Value('int64'),
                 'references': datasets.Value('int64'),
             }),
             # Homepage of the module for documentation
-            homepage="http://module.homepage",
             # Additional links to the codebase or references
-            codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
-            reference_urls=["http://path.to.reference.url/new_module"]
         )
-    def _download_and_prepare(self, dl_manager):
-        """Optional: download external resources useful to compute the scores"""
-        # TODO: Download external resources if needed
-        pass
-    def _compute(self, predictions, references):
-        """Returns the scores"""
-        # TODO: Compute the different scores of the module
-        accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
         return {
-            "accuracy": accuracy,
-        }

 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from typing import Dict
 import evaluate
 import datasets
+from torch import from_numpy, amax
+from torchmetrics.functional.classification.calibration_error import binary_calibration_error, multiclass_calibration_error
+from numpy import ndarray
 _CITATION = """\
+@InProceedings{huggingface:ece,
+title = {Expected calibration error (ECE)},
+authors={Nathan Fradet},
+year={2023}
 }
 """
 _DESCRIPTION = """\
+This metrics computes the expected calibration error (ECE).
+It directly calls the torchmetrics package:
+https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html
 """
 _KWARGS_DESCRIPTION = """
 Calculates how good are predictions given some references, using certain scores
 Args:
+    predictions: list of predictions to score. They must have a shape (N,C,...) if multiclass, or (N,...) if binary.
+    references: list of reference for each prediction, with a shape (N,...).
 Returns:
+    ece: expected calibration error
 Examples:
+    >>> ece = evaluate.load("Natooz/ece")
+    >>> results = ece.compute(
+    ...     references=np.array([[0.25, 0.20, 0.55],
+    ...                          [0.55, 0.05, 0.40],
+    ...                          [0.10, 0.30, 0.60],
+    ...                          [0.90, 0.05, 0.05]]),
+    ...     predictions=np.array(),
+    ...     num_classes=3,
+    ...     n_bins=3,
+    ...     norm="l1",
+    ... )
     >>> print(results)
+    {'ece': 0.2000}
 """
 @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
 class ECE(evaluate.Metric):
+    """
+    Proxy to the BinaryCalibrationError (ECE) metric of the torchmetrics package:
+    https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html
+    """
     def _info(self):
         return evaluate.MetricInfo(
             # This is the description that will appear on the modules page.
             module_type="metric",
             inputs_description=_KWARGS_DESCRIPTION,
             # This defines the format of each prediction and reference
             features=datasets.Features({
+                'predictions': datasets.Value('float32'),
                 'references': datasets.Value('int64'),
             }),
             # Homepage of the module for documentation
+            homepage="https://huggingface.co/spaces/Natooz/ece",
             # Additional links to the codebase or references
+            codebase_urls=["https://github.com/Lightning-AI/torchmetrics/blob/v0.11.4/src/torchmetrics/classification/calibration_error.py"],
+            reference_urls=["https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html"]
         )
+    def _compute(self, predictions=None, references=None, **kwargs) -> Dict[str, float]:
+        """Returns the ece.
+        See the torchmetrics documentation for more information on the arguments to pass.
+        https://torchmetrics.readthedocs.io/en/stable/classification/calibration_error.html
+            predictions: (N,C,...) if multiclass or (N,...) if binary
+            references: (N,...)
+        If "num_classes" is not provided in a multiclasses setting, the number maximum label index will
+        be used as "num_classes".
+        """
+        # Convert the input
+        if isinstance(predictions, ndarray):
+            predictions = from_numpy(predictions)
+        if isinstance(references, ndarray):
+            references = from_numpy(references)
+        max_label = amax(references, list(range(references.dim())))
+        if max_label > 1 and "num_classes" not in kwargs:
+            kwargs["num_classes"] = max_label
+        # Compute the calibration
+        if max_label > 1:
+            ece = multiclass_calibration_error(predictions, references, **kwargs)
+        else:
+            ece = binary_calibration_error(predictions, references, **kwargs)
         return {
+            "ece": float(ece),
+        }