tangledgroup
/

tangled-llama-33m-32k-base-v0.1

@@ -26,6 +26,10 @@ tags:
 ![logo](./misc/logo.png)
 [loss, val_loss](https://api.wandb.ai/links/mtasic85/z591qpyv)
 [val_ppl](https://api.wandb.ai/links/mtasic85/lku5hbws)
@@ -36,83 +40,53 @@ tags:
 ## lm-evaluation-harness
-|                 Tasks                 |Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
-|---------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
-|arc_challenge                          |      1|none            |     0|acc        |↑  |0.2065|±  |0.0118|
-|                                       |       |none            |     0|acc_norm   |↑  |0.2338|±  |0.0124|
-|gsm8k                                  |      3|flexible-extract|     5|exact_match|↑  |0.0144|±  |0.0033|
-|                                       |       |strict-match    |     5|exact_match|↑  |0.0000|±  |0.0000|
-|hellaswag                              |      1|none            |     0|acc        |↑  |0.2614|±  |0.0044|
-|                                       |       |none            |     0|acc_norm   |↑  |0.2604|±  |0.0044|
-|mmlu                                   |      2|none            |      |acc        |↑  |0.2280|±  |0.0035|
-| - humanities                          |      2|none            |      |acc        |↑  |0.2376|±  |0.0062|
-|  - formal_logic                       |      1|none            |     0|acc        |↑  |0.2698|±  |0.0397|
-|  - high_school_european_history       |      1|none            |     0|acc        |↑  |0.2000|±  |0.0312|
-|  - high_school_us_history             |      1|none            |     0|acc        |↑  |0.2500|±  |0.0304|
-|  - high_school_world_history          |      1|none            |     0|acc        |↑  |0.2785|±  |0.0292|
-|  - international_law                  |      1|none            |     0|acc        |↑  |0.2149|±  |0.0375|
-|  - jurisprudence                      |      1|none            |     0|acc        |↑  |0.2407|±  |0.0413|
-|  - logical_fallacies                  |      1|none            |     0|acc        |↑  |0.2270|±  |0.0329|
-|  - moral_disputes                     |      1|none            |     0|acc        |↑  |0.2341|±  |0.0228|
-|  - moral_scenarios                    |      1|none            |     0|acc        |↑  |0.2380|±  |0.0142|
-|  - philosophy                         |      1|none            |     0|acc        |↑  |0.1833|±  |0.0220|
-|  - prehistory                         |      1|none            |     0|acc        |↑  |0.2160|±  |0.0229|
-|  - professional_law                   |      1|none            |     0|acc        |↑  |0.2399|±  |0.0109|
-|  - world_religions                    |      1|none            |     0|acc        |↑  |0.3275|±  |0.0360|
-| - other                               |      2|none            |      |acc        |↑  |0.2481|±  |0.0077|
-|  - business_ethics                    |      1|none            |     0|acc        |↑  |0.3100|±  |0.0465|
-|  - clinical_knowledge                 |      1|none            |     0|acc        |↑  |0.2264|±  |0.0258|
-|  - college_medicine                   |      1|none            |     0|acc        |↑  |0.2486|±  |0.0330|
-|  - global_facts                       |      1|none            |     0|acc        |↑  |0.1700|±  |0.0378|
-|  - human_aging                        |      1|none            |     0|acc        |↑  |0.3229|±  |0.0314|
-|  - management                         |      1|none            |     0|acc        |↑  |0.1845|±  |0.0384|
-|  - marketing                          |      1|none            |     0|acc        |↑  |0.3034|±  |0.0301|
-|  - medical_genetics                   |      1|none            |     0|acc        |↑  |0.3100|±  |0.0465|
-|  - miscellaneous                      |      1|none            |     0|acc        |↑  |0.2401|±  |0.0153|
-|  - nutrition                          |      1|none            |     0|acc        |↑  |0.2418|±  |0.0245|
-|  - professional_accounting            |      1|none            |     0|acc        |↑  |0.2411|±  |0.0255|
-|  - professional_medicine              |      1|none            |     0|acc        |↑  |0.1838|±  |0.0235|
-|  - virology                           |      1|none            |     0|acc        |↑  |0.2831|±  |0.0351|
-| - social sciences                     |      2|none            |      |acc        |↑  |0.2155|±  |0.0074|
-|  - econometrics                       |      1|none            |     0|acc        |↑  |0.2281|±  |0.0395|
-|  - high_school_geography              |      1|none            |     0|acc        |↑  |0.1768|±  |0.0272|
-|  - high_school_government_and_politics|      1|none            |     0|acc        |↑  |0.1969|±  |0.0287|
-|  - high_school_macroeconomics         |      1|none            |     0|acc        |↑  |0.2103|±  |0.0207|
-|  - high_school_microeconomics         |      1|none            |     0|acc        |↑  |0.2143|±  |0.0267|
-|  - high_school_psychology             |      1|none            |     0|acc        |↑  |0.1872|±  |0.0167|
-|  - human_sexuality                    |      1|none            |     0|acc        |↑  |0.2672|±  |0.0388|
-|  - professional_psychology            |      1|none            |     0|acc        |↑  |0.2467|±  |0.0174|
-|  - public_relations                   |      1|none            |     0|acc        |↑  |0.2182|±  |0.0396|
-|  - security_studies                   |      1|none            |     0|acc        |↑  |0.1755|±  |0.0244|
-|  - sociology                          |      1|none            |     0|acc        |↑  |0.2438|±  |0.0304|
-|  - us_foreign_policy                  |      1|none            |     0|acc        |↑  |0.2700|±  |0.0446|
-| - stem                                |      2|none            |      |acc        |↑  |0.2058|±  |0.0072|
-|  - abstract_algebra                   |      1|none            |     0|acc        |↑  |0.1700|±  |0.0378|
-|  - anatomy                            |      1|none            |     0|acc        |↑  |0.1778|±  |0.0330|
-|  - astronomy                          |      1|none            |     0|acc        |↑  |0.1776|±  |0.0311|
-|  - college_biology                    |      1|none            |     0|acc        |↑  |0.2569|±  |0.0365|
-|  - college_chemistry                  |      1|none            |     0|acc        |↑  |0.1900|±  |0.0394|
-|  - college_computer_science           |      1|none            |     0|acc        |↑  |0.2600|±  |0.0441|
-|  - college_mathematics                |      1|none            |     0|acc        |↑  |0.2100|±  |0.0409|
-|  - college_physics                    |      1|none            |     0|acc        |↑  |0.2059|±  |0.0402|
-|  - computer_security                  |      1|none            |     0|acc        |↑  |0.2400|±  |0.0429|
-|  - conceptual_physics                 |      1|none            |     0|acc        |↑  |0.2681|±  |0.0290|
-|  - electrical_engineering             |      1|none            |     0|acc        |↑  |0.2345|±  |0.0353|
-|  - elementary_mathematics             |      1|none            |     0|acc        |↑  |0.2011|±  |0.0206|
-|  - high_school_biology                |      1|none            |     0|acc        |↑  |0.1806|±  |0.0219|
-|  - high_school_chemistry              |      1|none            |     0|acc        |↑  |0.1478|±  |0.0250|
-|  - high_school_computer_science       |      1|none            |     0|acc        |↑  |0.2400|±  |0.0429|
-|  - high_school_mathematics            |      1|none            |     0|acc        |↑  |0.2111|±  |0.0249|
-|  - high_school_physics                |      1|none            |     0|acc        |↑  |0.1788|±  |0.0313|
-|  - high_school_statistics             |      1|none            |     0|acc        |↑  |0.1620|±  |0.0251|
-|  - machine_learning                   |      1|none            |     0|acc        |↑  |0.2768|±  |0.0425|
-|truthfulqa_mc2                         |      2|none            |     0|acc        |↑  |0.4975|±  |0.0165|
-|winogrande                             |      1|none            |     0|acc        |↑  |0.5146|±  |0.0140|
-|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
-|------------------|------:|------|------|------|---|-----:|---|-----:|
-|mmlu              |      2|none  |      |acc   |↑  |0.2280|±  |0.0035|
-| - humanities     |      2|none  |      |acc   |↑  |0.2376|±  |0.0062|
-| - other          |      2|none  |      |acc   |↑  |0.2481|±  |0.0077|
-| - social sciences|      2|none  |      |acc   |↑  |0.2155|±  |0.0074|
-| - stem           |      2|none  |      |acc   |↑  |0.2058|±  |0.0072|

 ![logo](./misc/logo.png)
+A pretrained language model based on the Llama model with about **33M** parameters. This model has been trained on **9.7B** (`9,782,206,713`) tokens from more than **5.2M** (`5,285,575`) dataset rows.
+This model is designed not for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to **32K** (`32,768`) tokens, it was pretrained with sequences of **2K** (`2048`) tokens.
 [loss, val_loss](https://api.wandb.ai/links/mtasic85/z591qpyv)
 [val_ppl](https://api.wandb.ai/links/mtasic85/lku5hbws)
 ## lm-evaluation-harness
+|                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
+|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
+|leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
+| - leaderboard_bbh                                         |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
+|  - leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm               |↑  |0.5187|±  |0.0366|
+|  - leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm               |↑  |0.1600|±  |0.0232|
+|  - leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm               |↑  |0.3000|±  |0.0290|
+|  - leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm               |↑  |0.4680|±  |0.0316|
+|  - leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm               |↑  |0.0400|±  |0.0124|
+|  - leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm               |↑  |0.5240|±  |0.0316|
+|  - leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm               |↑  |0.2240|±  |0.0264|
+|  - leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm               |↑  |0.1480|±  |0.0225|
+|  - leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm               |↑  |0.3160|±  |0.0295|
+|  - leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm               |↑  |0.2560|±  |0.0277|
+|  - leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm               |↑  |0.4200|±  |0.0313|
+|  - leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm               |↑  |0.0680|±  |0.0160|
+|  - leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm               |↑  |0.1918|±  |0.0327|
+|  - leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm               |↑  |0.1160|±  |0.0203|
+|  - leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm               |↑  |0.2000|±  |0.0253|
+|  - leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm               |↑  |0.1320|±  |0.0215|
+|  - leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm               |↑  |0.4719|±  |0.0375|
+|  - leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
+|  - leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm               |↑  |0.2880|±  |0.0287|
+|  - leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm               |↑  |0.1880|±  |0.0248|
+|  - leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm               |↑  |0.1280|±  |0.0212|
+|  - leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm               |↑  |0.3040|±  |0.0292|
+|  - leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm               |↑  |0.4880|±  |0.0317|
+| - leaderboard_gpqa                                        |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_gpqa_diamond                               |      1|none  |     0|acc_norm               |↑  |0.2222|±  |0.0296|
+|  - leaderboard_gpqa_extended                              |      1|none  |     0|acc_norm               |↑  |0.2381|±  |0.0182|
+|  - leaderboard_gpqa_main                                  |      1|none  |     0|acc_norm               |↑  |0.2478|±  |0.0204|
+| - leaderboard_ifeval                                      |      3|none  |     0|inst_level_loose_acc   |↑  |0.1859|±  |   N/A|
+|                                                           |       |none  |     0|inst_level_strict_acc  |↑  |0.1811|±  |   N/A|
+|                                                           |       |none  |     0|prompt_level_loose_acc |↑  |0.0906|±  |0.0124|
+|                                                           |       |none  |     0|prompt_level_strict_acc|↑  |0.0832|±  |0.0119|
+| - leaderboard_math_hard                                   |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_math_algebra_hard                          |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_counting_and_prob_hard                |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_geometry_hard                         |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_intermediate_algebra_hard             |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_num_theory_hard                       |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_prealgebra_hard                       |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_precalculus_hard                      |      1|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+| - leaderboard_mmlu_pro                                    |    0.1|none  |     5|acc                    |↑  |0.1100|±  |0.0029|
+| - leaderboard_musr                                        |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_musr_murder_mysteries                      |      1|none  |     0|acc_norm               |↑  |0.5000|±  |0.0317|
+|  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2930|±  |0.0285|
+|  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.3720|±  |0.0306|