frankminors123
commited on
Commit
•
8bcfb35
1
Parent(s):
49ad1ea
Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,8 @@ the base period of rotary positional embeddings (RoPE) from 10000 to 1000000.
|
|
11 |
|
12 |
We use a sequence length of 1k for pre-training, and continue training based on this length during the fine-tuning stage. Based on a larger base period of RoPE, it can support up 15k context length extrapolation at inference time.
|
13 |
|
|
|
|
|
14 |
The Chinese prompt template used is as follows:
|
15 |
```python
|
16 |
PROMPT_TEMPLATE = (
|
|
|
11 |
|
12 |
We use a sequence length of 1k for pre-training, and continue training based on this length during the fine-tuning stage. Based on a larger base period of RoPE, it can support up 15k context length extrapolation at inference time.
|
13 |
|
14 |
+
Based on this [dataset](https://huggingface.co/datasets/code_search_net), we calculate the average of PPL on 1k length text to be 5.44. However, this value is 148.70 based on our pre-trained model.
|
15 |
+
|
16 |
The Chinese prompt template used is as follows:
|
17 |
```python
|
18 |
PROMPT_TEMPLATE = (
|