Spaces:
Running
on
CPU Upgrade
Feature Request: change request file format to disambiguate chat and non-chat models?
such as instead of:ModelName-SizeB_eval_request_False_bfloat16_Original.json
perhaps,
ModelName-SizeB_eval_request_False_bfloat16_
ChatOn_Original.json
ModelName-SizeB_eval_request_False_bfloat16_
ChatOff_Original.json
- so that they don't overwrite each other on reruns
and also the scores seem to be, (esp for some models), significantly different, depending on config - so that chat on and chat off entries are listed separately on the leaderboard
from what i'm seeing, is that for some similar models,
it seems that, chat template affects IFEval scores (โ), & MUSR too (โ), (but by how much?)
if this is to be updated, maybe look into the request files' commit history, also the multiple result files (which don't overwrite),
to help disambiguate & sort things out,
The chat template's, its effect on scores, seems to have a more significant impact than: bfloat16 vs float16
Question: what determines what chat-template will be used, what file or process, (e.g. generation_config.json
) , what else, or other assumptions / defaults ?
Hi @CombinHorizon ,
Thank you for your suggestion!
We agree that this modification can help to compare a model with and without the chat template. We're actually in the process of revamping our request naming system, as some current parameters are no longer relevant
We'll come back to you as soon as we have decided on a new simpler format!