zijianhu commited on
Commit
6389dde
1 Parent(s): d1c8f57

Update README.md

Browse files

add reference to tensoropera/Fox-1-1.6B-Instruct-v0.1

Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -1,18 +1,25 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
5
  pipeline_tag: text-generation
6
  ---
7
 
8
  ## Model Card for Fox-1-1.6B
9
 
10
  > [!IMPORTANT]
11
- > This model is a base pretrained model which requires further finetuning for most use cases. We will release the instruction-tuned version soon.
 
 
 
12
 
13
- Fox-1 is a decoder-only transformer-based small language model (SLM) with 1.6B total parameters developed by [TensorOpera AI](https://tensoropera.ai/). The model was trained with a 3-stage data curriculum on 3 trillion tokens of text and code data in 8K sequence length. Fox-1 uses grouped query attention (GQA) with 4 KV heads and 16 attention heads and has a deeper architecture than other SLMs.
 
 
 
14
 
15
- For the full details of this model please read our [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
 
16
 
17
  ## Benchmarks
18
 
@@ -28,4 +35,4 @@ score of the 6 benchmarks. The model was evaluated on a machine with 8*H100 GPUs
28
  | HellaSwag | 62.82% | 61.55% | 71.60% | 70.46% | 65.23% |
29
  | TruthfulQA | 38.66% | 39.37% | 33.05% | 38.77% | 36.98% |
30
  | Winogrande | 60.62% | 65.51% | 65.51% | 65.27% | 61.64% |
31
- | Average | 47.13% | 46.81% | 46.36% | 45.92% | 38.28% |
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - en
5
  pipeline_tag: text-generation
6
  ---
7
 
8
  ## Model Card for Fox-1-1.6B
9
 
10
  > [!IMPORTANT]
11
+ > This model is a base pretrained model which requires further finetuning for most use cases.
12
+ > For a more interactive experience, we
13
+ > recommend [tensoropera/Fox-1-1.6B-Instruct-v0.1](https://huggingface.co/tensoropera/Fox-1-1.6B-Instruct-v0.1), the
14
+ > instruction-tuned version of Fox-1.
15
 
16
+ Fox-1 is a decoder-only transformer-based small language model (SLM) with 1.6B total parameters developed
17
+ by [TensorOpera AI](https://tensoropera.ai/). The model was trained with a 3-stage data curriculum on 3 trillion
18
+ tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
19
+ 16 attention heads for faster inference.
20
 
21
+ For the full details of this model please read
22
+ our [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
23
 
24
  ## Benchmarks
25
 
 
35
  | HellaSwag | 62.82% | 61.55% | 71.60% | 70.46% | 65.23% |
36
  | TruthfulQA | 38.66% | 39.37% | 33.05% | 38.77% | 36.98% |
37
  | Winogrande | 60.62% | 65.51% | 65.51% | 65.27% | 61.64% |
38
+ | Average | 47.13% | 46.81% | 46.36% | 45.92% | 38.28% |