"How to run in llama.cpp" is for 2048 instead of 8192 context size
On the Model card, under "How to run in llama.cpp", the example command line limits the context to 2048 instead of the 8192 this model supports:
./main -t 10 -ngl 32 -m hermes-llongma-2-13b-8k.ggmlv3.q4_K_M.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
What would the proper command be? Besides raising the context size, does it require scaling parameters like --rope-freq-base
or --rope-freq-scale
?
Yes it does. I'll update that. I believe the command is -c 8192 --rope-freq-base 10000 --rope-freq-scale 0.5
Great, thanks for confirmation and updating the information.
It worked for me using the equivalent koboldcpp command line options: --contextsize 8192 --ropeconfig 0.5 10000