p208p2002 commited on
Commit
9b3ce7d
β€’
1 Parent(s): 32ef01c

update app.py

Browse files
Files changed (1) hide show
  1. app.py +1 -1
app.py CHANGED
@@ -18,7 +18,7 @@ some cases we report the loss for specific tokens within the context.
18
 
19
  β€’ C β‰ˆ 6ND – an estimate of the total non-embedding training compute
20
 
21
- $$E=1.69, A=406.4, \\alpha=0.34, \\beta=0.28$$
22
  $$C\\approx6DN$$
23
  $$L(N,D)=E+\\frac{A}{N^\\alpha}+\\frac{B}{D^\\beta}$$
24
  $$N_{opt}(C),D_{opt}(C)={\\arg\\min}_{N,D\ s.t.\ FLOP/s(N,D)=C}\ L(N,D)$$
 
18
 
19
  β€’ C β‰ˆ 6ND – an estimate of the total non-embedding training compute
20
 
21
+ $$E=1.69, A=406.4, B=410.7, \\alpha=0.34, \\beta=0.28$$
22
  $$C\\approx6DN$$
23
  $$L(N,D)=E+\\frac{A}{N^\\alpha}+\\frac{B}{D^\\beta}$$
24
  $$N_{opt}(C),D_{opt}(C)={\\arg\\min}_{N,D\ s.t.\ FLOP/s(N,D)=C}\ L(N,D)$$