potsawee commited on
Commit
f53f896
1 Parent(s): 3fcd7df

add text to about tab

Browse files
Files changed (1) hide show
  1. src/about.py +8 -2
src/about.py CHANGED
@@ -20,11 +20,17 @@ INTRODUCTION_TEXT = """
20
  """
21
 
22
  LLM_BENCHMARKS_TEXT = f"""
23
- TODO write about page here
 
 
 
 
 
 
24
  """
25
 
26
  EVALUATION_QUEUE_TEXT = """
27
- TODO write this
28
  """
29
 
30
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 
20
  """
21
 
22
  LLM_BENCHMARKS_TEXT = f"""
23
+ This leaderboard presents hallucination benchmarks for multimodal LLMs on tasks of different input modalities, including image-captioning and video-captioning. For each task, we measure hallucination levels of the text output of various multimodal LLMs using existing hallucination metrics.
24
+
25
+ Some metrics such as POPE*, CHAIR, UniHD are designed specifically for image-to-text tasks, and thus are not directly applicable to video-to-text tasks. For the image-to-text benchmark, we also provide the ranking based human rating where annotators were asked to rate the outputs of the multimodal LLMs on MHaluBench. *Note that the POPE paper proposed both a dataset and a method.
26
+
27
+ More information about each existing metric can be found in their relevant paper, and CrossCheckGPT is proposed in https://arxiv.org/pdf/2405.13684.
28
+
29
+ Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at potsawee@scb10x.com.
30
  """
31
 
32
  EVALUATION_QUEUE_TEXT = """
33
+ Currently, the leaderboard hasn't yet supported automatic evaluation of new models, but you are welcome to request an evaluation of a new model by creating a new discussion, or emailing us at potsawee@scb10x.com.
34
  """
35
 
36
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"