AALF commited on
Commit
27f1521
1 Parent(s): cc4b0f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -11
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  - generated_from_trainer
9
  ---
10
 
11
- # gemma-2-27b-it-simpo-beta10-gamma5-lr8e-7
12
 
13
  ## Implementation Details
14
  We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
@@ -77,14 +77,4 @@ UltraFeedback paper:
77
  journal={arXiv preprint arXiv:2310.01377},
78
  year={2023}
79
  }
80
- ```
81
-
82
- ArmoRM paper:
83
- ```
84
- @article{wang2024interpretable,
85
- title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
86
- author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
87
- journal={arXiv preprint arXiv:2406.12845},
88
- year={2024}
89
- }
90
  ```
 
8
  - generated_from_trainer
9
  ---
10
 
11
+ # gemma-2-27b-it-SimPO-37K Model Card
12
 
13
  ## Implementation Details
14
  We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
 
77
  journal={arXiv preprint arXiv:2310.01377},
78
  year={2023}
79
  }
 
 
 
 
 
 
 
 
 
 
80
  ```