Text Generation
Transformers
PyTorch
English
llama
text-generation-inference
Inference Endpoints
winglian's picture TheBloke's picture
Change cache = true in config.json to significantly boost inference performance (#1)
1959f62