Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

Resources

View closed (939)

Resource: Understanding the new benchmarks

#796 opened 3 months ago by

💬 Discussion thread: Model contamination techniques 💬

#472 opened 10 months ago by

💎 Resources and community initiatives around the Leaderboard! 💎

#174 opened about 1 year ago by

Model evaluation disappeared??? (rombodawg/Replete-LLM-V2.5-Qwen-32b_Duplicated)

#967 opened about 6 hours ago by

Failde test

#966 opened about 17 hours ago by

Failed test

#965 opened about 18 hours ago by

How to understand the different between local report and the scores reported on open-llm-leaderboard/

#964 opened 1 day ago by

Feature Request: Make Adapter/ Delta Models specify both its and the base model's commit

#957 opened 5 days ago by

Consider filtering for MoE models

#956 opened 5 days ago by

Feature Request: change request file format to disambiguate chat and non-chat models?

#954 opened 6 days ago by

Eval time vs. score diagram

#950 opened 8 days ago by

Miss the results of xinchen9/Mistral-7B-CoT

#949 opened 9 days ago by

Normalization for MMLU-Pro doesn't make sense

#947 opened 11 days ago by

simplify_ux

#944 opened 12 days ago by

Are Qwen models pretrained or continuously pretrained?

#941 opened 16 days ago by

Incorrect ifeval benchmark

#879 opened about 2 months ago by

Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`

#858 opened 2 months ago by

singhsidhukuldeep

Upvote to evaluate deepseek-coder-v2

#793 opened 3 months ago by

Feature request: Add toggle to only show models with linked dataset

#763 opened 4 months ago by

Feature request: Hide models with insufficient model card from default view in leaderboard

#762 opened 4 months ago by

Discussion: naming pattern to converge on to better identify fine-tunes

#761 opened 4 months ago by

Crowd-Source Hardware for the LeaderBoard?

#570 opened 8 months ago by

Feature request: Using weights hash to identify duplicates

#422 opened 10 months ago by

Tool: Adding evaluation results to model cards

#370 opened 11 months ago by

Feature suggestion: average of selected (rather than all) columns

#368 opened 11 months ago by

Tool: Open LLM Leaderboard Model Renamer

#310 opened 12 months ago by