Downtown-Case commited on
Commit
b1156fe
1 Parent(s): 2be1c81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -4,16 +4,62 @@ license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE
4
  language:
5
  - en
6
  pipeline_tag: text-generation
 
 
 
7
  ---
8
 
9
 
 
10
 
 
11
 
12
- ## THIS QUANTIZATION APPEARS TO BE BROKEN, JUST UPLOADED FOR TESTING
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Citation
16
 
 
 
17
  ```
18
  @misc{qwen2.5,
19
  title = {Qwen2.5: A Party of Foundation Models},
 
4
  language:
5
  - en
6
  pipeline_tag: text-generation
7
+ base_model:
8
+ - Qwen/Qwen2.5-32B
9
+ library_name: transformers
10
  ---
11
 
12
 
13
+ # Quantization
14
 
15
+ 4.1bpw quantization using default settings, for good amount of context on a 24GB GPU.
16
 
17
+ This is the base model, not instruct! Base models tend to be better for raw completion (like novel continuation), especially at long context.
18
 
19
+ # Qwen2.5-32B
20
+
21
+ ## Introduction
22
+
23
+ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
24
+
25
+ - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
26
+ - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
27
+ - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
28
+ - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
29
+
30
+ **This repo contains the base 32B Qwen2.5 model**, which has the following features:
31
+ - Type: Causal Language Models
32
+ - Training Stage: Pretraining
33
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
34
+ - Number of Parameters: 32.5B
35
+ - Number of Paramaters (Non-Embedding): 31.0B
36
+ - Number of Layers: 64
37
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
38
+ - Context Length: 131,072 tokens
39
+
40
+ **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
41
+
42
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
43
+
44
+ ## Requirements
45
+
46
+ The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
47
+
48
+ With `transformers<4.37.0`, you will encounter the following error:
49
+ ```
50
+ KeyError: 'qwen2'
51
+ ```
52
+
53
+ ## Evaluation & Performance
54
+
55
+ Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
56
+
57
+ For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
58
 
59
  ## Citation
60
 
61
+ If you find our work helpful, feel free to give us a cite.
62
+
63
  ```
64
  @misc{qwen2.5,
65
  title = {Qwen2.5: A Party of Foundation Models},