Spaces:

yentinglin
/

open-tw-llm-leaderboard

Running

App Files Files Community

yentinglin commited on May 24

Commit

6edca0c

•

1 Parent(s): 4483cea

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +17 -5

src/about.py CHANGED Viewed

@@ -27,17 +27,29 @@ TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-Intro text
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
-## How it works
-## Reproducibility
-To reproduce our results, here is the commands you can run:
-please checkout this command https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
 """
 EVALUATION_QUEUE_TEXT = """

 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+This leaderboard showcases the performance of large language models (LLMs) on various Taiwanese Mandarin language understanding tasks. The models are evaluated on their accuracy across different benchmarks, providing insights into their strengths and weaknesses in comprehending and generating Taiwanese Mandarin text.
+這個排行榜展示了大型語言模型 (LLMs) 在各種臺灣繁體中文語言理解任務上的表現。
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
+The leaderboard evaluates LLMs on the following benchmarks:
+1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
+2. TW Truthful QA: Assesses the model's capability to provide truthful answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
+3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
+4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
+To reproduce our results, please follow the instructions in the provided GitHub repository: https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
+該排行榜在以下考題上評估 LLMs:
+1. TMLU(臺灣國語語言理解):衡量模型理解各個領域臺灣國語文本的能力。
+2. TW Truthful QA:評估模型以臺灣國語提供真實答案的能力,重點關注臺灣特定的背景。
+3. TW Legal Eval:使用臺灣律師資格考試的問題,評估模型對臺灣國語法律術語和概念的理解。
+4. MMLU(大規模多任務語言理解):測試模型在英語中各種任務上的表現。
+要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
 """
 EVALUATION_QUEUE_TEXT = """