Text Generation
PEFT
Safetensors
llama-2
Eval Results
dfurman commited on
Commit
c29c747
1 Parent(s): de8ab98

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -103,9 +103,9 @@ Example 3:
103
  The architecture is a modification of a standard decoder-only transformer.
104
 
105
  The llama-2-70b models have been modified from a standard transformer in the following ways:
106
- * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
107
  * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
108
  * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
 
109
 
110
  | Hyperparameter | Value |
111
  |----------------|-------|
 
103
  The architecture is a modification of a standard decoder-only transformer.
104
 
105
  The llama-2-70b models have been modified from a standard transformer in the following ways:
 
106
  * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
107
  * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
108
+ * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
109
 
110
  | Hyperparameter | Value |
111
  |----------------|-------|