chiyuzhang commited on
Commit
944519f
1 Parent(s): 7256c50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md CHANGED
@@ -41,6 +41,88 @@ The following hyperparameters were used during training:
41
  ## Training and evaluation data
42
  We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Use
46
 
 
41
  ## Training and evaluation data
42
  We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
43
 
44
+ ## Model Models
45
+ You can download LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
46
+
47
+ <table>
48
+ <caption>
49
+ LaMini Language Models collection.
50
+ </caption>
51
+ <thead>
52
+ <tr>
53
+ <th>Name</th>
54
+ <th>Architecture</th>
55
+ <th>Initialization</th>
56
+ </tr>
57
+ </thead>
58
+ <tbody>
59
+ <tr>
60
+ <td>LaMini-T5-61M</td>
61
+ <td>encoder-decoder</td>
62
+ <td>T5-small</td>
63
+ </tr>
64
+ <tr>
65
+ <td>LaMini-T5-223M</td>
66
+ <td>encoder-decoder</td>
67
+ <td>T5-base</td>
68
+ </tr>
69
+ <tr>
70
+ <td>LaMini-T5-738M</td>
71
+ <td>encoder-decoder</td>
72
+ <td>T5-large</td>
73
+ </tr>
74
+ <tr>
75
+ <td>LaMini-Flan-T5-77M</td>
76
+ <td>encoder-decoder</td>
77
+ <td>Flan-T5-small</td>
78
+ </tr>
79
+ <tr>
80
+ <td>LaMini-Flan-T5-248M</td>
81
+ <td>encoder-decoder</td>
82
+ <td>Flan-T5-base</td>
83
+ </tr>
84
+ <tr>
85
+ <td>LaMini-Flan-T5-783M</td>
86
+ <td>encoder-decoder</td>
87
+ <td>Flan-T5-large</td>
88
+ </tr>
89
+ <tr>
90
+ <td>LaMini-Cb-111M</td>
91
+ <td>decoder-only</td>
92
+ <td>Cerebras-GPT-111M</td>
93
+ </tr>
94
+ <tr>
95
+ <td>LaMini-Cb-256M</td>
96
+ <td>decoder-only</td>
97
+ <td>Cerebras-GPT-256M</td>
98
+ </tr>
99
+ <tr>
100
+ <td>LaMini-Cb-590M</td>
101
+ <td>decoder-only</td>
102
+ <td>Cerebras-GPT-590M</td>
103
+ </tr>
104
+ <tr>
105
+ <td>LaMini-Cb-1.3B</td>
106
+ <td>decoder-only</td>
107
+ <td>Cerebras-GPT-1.3B</td>
108
+ </tr>
109
+ <tr>
110
+ <td>LaMini-GPT-124M</td>
111
+ <td>decoder-only</td>
112
+ <td>GPT-2</td>
113
+ </tr>
114
+ <tr>
115
+ <td>LaMini-GPT-774M</td>
116
+ <td>decoder-only</td>
117
+ <td>GPT-2 large</td>
118
+ </tr>
119
+ <tr>
120
+ <td>LaMini-GPT-1.5B</td>
121
+ <td>decoder-only</td>
122
+ <td>GPT-2 xl</td>
123
+ </tr>
124
+ </tbody>
125
+ </table>
126
 
127
  ## Use
128