Edit model card

Qwen2-7B-Instruct-SPPO-Function-call-v2.11

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1457
  • Rewards/chosen: -1.7639
  • Rewards/rejected: -14.1509
  • Rewards/accuracies: 0.9364
  • Rewards/margins: 12.3871
  • Logps/rejected: -551.2230
  • Logps/chosen: -189.1563
  • Logits/rejected: -1.6081
  • Logits/chosen: -1.5770

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2001 0.1145 250 0.2192 0.7210 -1.8684 0.9162 2.5895 -305.5732 -139.4582 -1.6566 -1.7096
0.1246 0.2290 500 0.1662 0.6780 -4.7708 0.9277 5.4487 -363.6193 -140.3193 -1.6309 -1.6619
0.0831 0.3436 750 0.1441 0.5794 -6.0728 0.9191 6.6521 -389.6595 -142.2913 -1.6015 -1.6194
0.0698 0.4581 1000 0.1458 -0.1931 -8.1002 0.9335 7.9071 -430.2079 -157.7405 -1.6062 -1.6142
0.0872 0.5726 1250 0.1416 -0.0252 -8.5014 0.9393 8.4762 -438.2315 -154.3822 -1.5572 -1.5535
0.0547 0.6871 1500 0.1330 -0.4963 -9.4547 0.9335 8.9584 -457.2992 -163.8050 -1.5598 -1.5574
0.1092 0.8016 1750 0.1337 -1.2236 -10.3660 0.9277 9.1424 -475.5235 -178.3509 -1.5822 -1.5827
0.1109 0.9162 2000 0.1190 -0.4262 -9.6091 0.9364 9.1829 -460.3859 -162.4036 -1.5682 -1.5631
0.013 1.0307 2250 0.1355 -0.4415 -10.4543 0.9393 10.0128 -477.2908 -162.7087 -1.5520 -1.5425
0.0107 1.1452 2500 0.1450 -1.2114 -11.9528 0.9393 10.7414 -507.2599 -178.1073 -1.5666 -1.5494
0.0203 1.2597 2750 0.1424 -1.2291 -12.7381 0.9364 11.5090 -522.9661 -178.4617 -1.5798 -1.5536
0.0128 1.3743 3000 0.1428 -1.5064 -13.4244 0.9393 11.9180 -536.6923 -184.0067 -1.5982 -1.5679
0.0447 1.4888 3250 0.1490 -1.6333 -13.8914 0.9422 12.2581 -546.0324 -186.5450 -1.6084 -1.5768
0.0114 1.6033 3500 0.1508 -1.8097 -14.2168 0.9393 12.4071 -552.5399 -190.0730 -1.6144 -1.5842
0.0201 1.7178 3750 0.1447 -1.7474 -14.1355 0.9393 12.3881 -550.9136 -188.8267 -1.6087 -1.5784
0.0139 1.8323 4000 0.1461 -1.7396 -14.1065 0.9393 12.3669 -550.3343 -188.6715 -1.6088 -1.5783
0.0038 1.9469 4250 0.1457 -1.7639 -14.1509 0.9364 12.3871 -551.2230 -189.1563 -1.6081 -1.5770

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .