bhenrym14 commited on
Commit
195500b
1 Parent(s): 273c8d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -25,6 +25,8 @@ REQUIRED: you'll need to patch in the appropriate RoPE scaling module. see: [rep
25
 
26
  Hopefully there is a quick fix to exllama that can make >8k work soon.
27
 
 
 
28
  ## Motivation
29
  Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. Here I attempt to take a smaller model and extend the context to 16k tokens. This, however, proved problematic as stability suffered in the 8-10k+ range. The Meta paper demonstrated that decreasing perplexities can still be acheived at these context lengths; however, their approach involved tuning all variables on the maximum sequence length after incorporating the RoPE scaling adjustment.
30
 
 
25
 
26
  Hopefully there is a quick fix to exllama that can make >8k work soon.
27
 
28
+ Otherwise for context <8k. Use exllama. Set `max_seq_len` to 16384, and `compress_pos_emb` to 8.
29
+
30
  ## Motivation
31
  Recent advancements in extending context by RoPE scaling ([kaiokendev](https://kaiokendev.github.io/til#extending-context-to-8k) and [meta AI)](https://arxiv.org/abs/2306.15595)) demonstrate the ability to extend the context window without (total) retraining. Finetuning has shown to be necessary to properly leverage the longer context. Here I attempt to take a smaller model and extend the context to 16k tokens. This, however, proved problematic as stability suffered in the 8-10k+ range. The Meta paper demonstrated that decreasing perplexities can still be acheived at these context lengths; however, their approach involved tuning all variables on the maximum sequence length after incorporating the RoPE scaling adjustment.
32