open-llm-leaderboard/open_llm_leaderboard · Consider filtering for MoE models

I want to compare MoE models with each other. This is NOT easy, because it is only possible to "hide" them. It is not possible to hide dense models or filter for them. Their naming scheme is not standardized. They are not following the [GGUF naming convention][https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention] or any other standard, as far as I am aware, so it is hard to search for them via the search feature.

Consider the following MoE model names:

allenai/OLMoE-1B-7B-0924
microsoft/GRIN-MoE
Qwen/Qwen1.5-MoE-A2.7B-Chat
Jamba-12B-52B
Qwen/Qwen1.5-3B-14B
JetMoE-2B-9B
OpenMoE-2B-9B
Arctic-17B-480B
mistralai/Mixtral-8x7B-Instruct-v0.1

Not all of them have "MoE" in the model name.

The difficulty in finding them is that the parameter count is divided into

a) activated parameters
b) total parameter count

Total parameter count and activated parameter count should not be confused with the number of experts per layers in the model and number of activated experts per layer respectively.