Performance oddities of the 3M model.
#3 opened 5 months ago
by
MartialTerran
GPT-2 model having16 4-float attention heads
#2 opened 5 months ago
by
MartialTerran
Adding `safetensors` variant of this model
#1 opened 11 months ago
by
SFconvertbot