Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

965

Eval time vs. score diagram

#950

by HenkPoley - opened 9 days ago

Discussion

HenkPoley

9 days ago

•

edited 9 days ago

On the Portuguese version of the old/'v1' Open LLM Leaderboard I saw an interesting plot.

See the Metrics tab, and look at the bottom: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard

There you can kind of oggle the scaling laws. Also, that around 9B the models can ace these older style tests.

Maybe add something like that, or one size vs. score; instead of evaluation time.

alozowski

Open LLM Leaderboard org 5 days ago

Hi @HenkPoley ,

This is a very good idea! We're a bit short on time at the moment, would you be interested in contributing this feature?

CombinHorizon

4 days ago

some of the notable models that performed well in Portuguese are

THUDM/glm-4-9b-chat-1m
THUDM/glm-4-9b-chat
THUDM/glm-4-9b

but unfortunately they trigger the error message: “needs to be launched with trust_remote_code=True”

could the model be changed to somehow mitigate this? what are the prospects?

alozowski

Open LLM Leaderboard org 3 days ago

Hi @CombinHorizon ,

Currently we have results for THUDM/glm-4-9b and THUDM/glm-4-9b-chat that we added manually, you can find them on the Leaderboard. If you're interested, we can also add THUDM/glm-4-9b-chat-1m as well

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment