Christopher

New activity in moreh/MoMo-72B-lora-1.8.7-DPO 8 months ago

Version for Qwen1.5-72B

#9 opened 8 months ago by

New activity in abacusai/Smaug-72B-v0.1 8 months ago

Fine-tune for Qwen1.5

#14 opened 8 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 8 months ago

Any updates on redesigning the leaderboard?

#595 opened 8 months ago by

152334H/miqu-1-70b-sf marked as private or deleted

#587 opened 8 months ago by

meta-llama/Llama-2-70b-hf is set as "Private or deleted"

5

#580 opened 8 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 9 months ago

Improvement: "Metrics over time" has private/deleted models

#571 opened 9 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 10 months ago

Brainstorming: Call for a Time-Sensitive, Rolling-Update Benchmark Crowdsourced by the Community

24

#481 opened 10 months ago by

JosephusCheung

Brainstorming: Suggestions for improving the leaderboard

25

#477 opened 10 months ago by

xxyyy123

[FLAG] fblgit/una-xaberius-34b-v1beta

125

#444 opened 10 months ago by

XXXGGGNEt

Black Box Benchmarks over Contamination Scanning

6

#470 opened 10 months ago by

New activity in DopeorNope/COKAL-v1-70B 10 months ago

High ARC benchmark score

#1 opened 10 months ago by

New activity in ceadar-ie/FinanceConnect-13B 10 months ago

100 on HellaSwag benchmark

#1 opened 10 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 10 months ago

[FLAG] TigerResearch/tigerbot-70b-chat-v4-4k

23

#438 opened 10 months ago by

fblgit

Feature request: Run 100B + models automatically

15

#434 opened 10 months ago by

ChuckMcSneed

model was not found on hub!

#433 opened 10 months ago by

liuda1

[FLAG?] Tigerbot-70b-chat-v2 scores are too high.

9

#414 opened 11 months ago by

New activity in TigerResearch/tigerbot-70b-chat-v2 10 months ago

High ARC and TruthfulQA scores

#4 opened 10 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard 11 months ago

Add Orca-2 7b and 13b to queue

#397 opened 11 months ago by

Can't sort certain columns

#386 opened 11 months ago by

New activity in open-llm-leaderboard/open_llm_leaderboard about 1 year ago

Improve speed leaderboard front end

7

#249 opened about 1 year ago by

Ostixe360

Two airoboros-l2-70b-2.1 models on leaderboard. One with far larger TruthfulQA

#238 opened about 1 year ago by

[FLAG] Voicelab/trurl-2-13b: training data surely includes the test data, right?

6

#202 opened about 1 year ago by

Why are there no OpenAI models here? we need GPT-3.5 and GPT4 to compare!

#169 opened about 1 year ago by

FarisHijazi

FreeWilly2 by Stability AI is about to beat GPT3.5

#120 opened about 1 year ago by

gsaivinay

New activity in open-llm-leaderboard/open_llm_leaderboard over 1 year ago

Add a column: average score per billion parameters

#88 opened over 1 year ago by

rfernand

How long does it take to run these tests?

7

#90 opened over 1 year ago by

Goldenblood56

why isn't truthfulQA shown in the leaderboards?

#81 opened over 1 year ago by

wfzimmerman

Models for Human/GPT4 Eval

25

#65 opened over 1 year ago by

natolambert

[feature request] prioritize the queue (by user voting?)