open-llm-leaderboard/open_llm_leaderboard · Feature Request: change request file format to disambiguate chat and non-chat models?

such as instead of:
ModelName-SizeB_eval_request_False_bfloat16_Original.json
perhaps,

ModelName-SizeB_eval_request_False_bfloat16_ChatOn_Original.json
ModelName-SizeB_eval_request_False_bfloat16_ChatOff_Original.json

so that they don't overwrite each other on reruns
and also the scores seem to be, (esp for some models), significantly different, depending on config
so that chat on and chat off entries are listed separately on the leaderboard

from what i'm seeing, is that for some similar models,
it seems that, chat template affects IFEval scores (⇈), & MUSR too (⇊), (but by how much?)

if this is to be updated, maybe look into the request files' commit history, also the multiple result files (which don't overwrite),
to help disambiguate & sort things out,

The chat template's, its effect on scores, seems to have a more significant impact than: bfloat16 vs float16

Question: what determines what chat-template will be used, what file or process, (e.g. generation_config.json) , what else, or other assumptions / defaults ?