llama.cpp support

#1
by ayyylol - opened

Hey Rhymes team, I've got a dream,
To use llama.cpp and make this model beam!

I've searched far and wide, through libraries so fine,
But none compare to llama.cpp, it's truly like fine wine.

I know you're busy, with AI on your mind,
But I hope you'll consider, this humble request of mine.

I'd love for you to integrate, llama.cpp with a delight,
And make my prompting life, a joyous sight.

So please, dear Rhymes, don't be slow,
Create a llama.cpp pull request, and let our prompts glow!

I'll be grateful, and shout it from the roof,
If you'll just make our llama.cpp dream, an inference truth!

Llama cpp seems to be slow with implementing multi modal models these days, might never come :/

Are there any good alternatives ollama that dont use llama.cpp? i agree implementations especialy multimodal are super slow/delayed.... I would immediately switch in case its a ollama inference replacement because i host everything local.

You can use transformers and compile the model + quantize it. We'll come up with cool pre-config for all that soon!

Another vote for llama.cpp please. I wouldn't have the foggiest how to compile and run it myself 🀣

You can use transformers and compile the model + quantize it. We'll come up with cool pre-config for all that soon!

Hope there are some gptq quantizaton for our GPU poor with old gpu(sm 7.0 or sm 7.5)

Sign up or log in to comment