"How to run in llama.cpp" is for 2048 instead of 8192 context size

by wolfram - opened Aug 2, 2023

Aug 2, 2023

•

edited Aug 2, 2023

On the Model card, under "How to run in llama.cpp", the example command line limits the context to 2048 instead of the 8192 this model supports:

./main -t 10 -ngl 32 -m hermes-llongma-2-13b-8k.ggmlv3.q4_K_M.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

What would the proper command be? Besides raising the context size, does it require scaling parameters like --rope-freq-base or --rope-freq-scale?

TheBloke

Owner Aug 2, 2023

Yes it does. I'll update that. I believe the command is -c 8192 --rope-freq-base 10000 --rope-freq-scale 0.5

wolfram

Aug 2, 2023

Great, thanks for confirmation and updating the information.

It worked for me using the equivalent koboldcpp command line options: --contextsize 8192 --ropeconfig 0.5 10000

wolfram changed discussion status to closed Aug 2, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment