Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Resource: Understanding the new benchmarks
pinned
2
#796 opened 3 months ago
by
rombodawg
💬 Discussion thread: Model contamination techniques 💬
pinned
34
#472 opened 10 months ago
by
clefourrier
💎 Resources and community initiatives around the Leaderboard! 💎
pinned#174 opened about 1 year ago
by
clefourrier
Model evaluation disappeared??? (rombodawg/Replete-LLM-V2.5-Qwen-32b_Duplicated)
#967 opened about 6 hours ago
by
rombodawg
Failde test
#966 opened about 17 hours ago
by
zelk12
Failed test
#965 opened about 18 hours ago
by
legolasyiu
How to understand the different between local report and the scores reported on open-llm-leaderboard/
#964 opened 1 day ago
by
xinchen9
Feature Request: Make Adapter/ Delta Models specify both its and the base model's commit
1
#957 opened 5 days ago
by
CombinHorizon
Consider filtering for MoE models
1
#956 opened 5 days ago
by
ThiloteE
Feature Request: change request file format to disambiguate chat and non-chat models?
1
#954 opened 6 days ago
by
CombinHorizon
Eval time vs. score diagram
3
#950 opened 8 days ago
by
HenkPoley
Miss the results of xinchen9/Mistral-7B-CoT
1
#949 opened 9 days ago
by
xinchen9
Normalization for MMLU-Pro doesn't make sense
10
#947 opened 11 days ago
by
ekurtic
simplify_ux
2
#944 opened 12 days ago
by
clefourrier
Are Qwen models pretrained or continuously pretrained?
5
#941 opened 16 days ago
by
djstrong
Incorrect ifeval benchmark
6
#879 opened about 2 months ago
by
DavidGF
Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`
5
#858 opened 2 months ago
by
singhsidhukuldeep
Upvote to evaluate deepseek-coder-v2
3
#793 opened 3 months ago
by
g1y5x3
Feature request: Add toggle to only show models with linked dataset
1
#763 opened 4 months ago
by
ThiloteE
Feature request: Hide models with insufficient model card from default view in leaderboard
4
#762 opened 4 months ago
by
ThiloteE
Discussion: naming pattern to converge on to better identify fine-tunes
17
#761 opened 4 months ago
by
ThiloteE
Crowd-Source Hardware for the LeaderBoard?
4
#570 opened 8 months ago
by
ibivibiv
Feature request: Using weights hash to identify duplicates
1
#422 opened 10 months ago
by
mrfakename
Tool: Adding evaluation results to model cards
47
#370 opened 11 months ago
by
Weyaxi
Feature suggestion: average of selected (rather than all) columns
4
#368 opened 11 months ago
by
Minus0
Tool: Open LLM Leaderboard Model Renamer
31
#310 opened 12 months ago
by
Weyaxi