Benchmarks?
#2
by
rombodawg
- opened
Can we get this submitted to open llm leaderboard? A humaneval score would be nice too
Looks like someone submitted it to the leaderboard. I can run some additional benchmarks once the DPO version finishes, to compare both. It seems there's some sort of issue with the model's performance on gsm8k however.
jondurbin
changed discussion status to
closed