File size: 67,916 Bytes
dc9ca7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 |
[2024-07-15 08:06:36,677] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-15 08:08:02,878] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-15 08:08:02,878] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-15 08:08:02,882] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-15 08:08:02,914] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-15 08:08:35,144] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-15 08:08:35,144] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-07-15 08:08:35,149] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-15 08:08:35,153] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-15 08:08:35,157] [INFO] [comm.py:637:init_distributed] cdb=None
2024-07-15 08:08:35 - INFO - __main__ - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='/share/home/models/CHT-Mistral-7B/CHT-6/', model_revision='main', model_code_revision=None, torch_dtype=None, tokenizer_name_or_path=None, trust_remote_code=False, use_flash_attention_2=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False, bnb_4bit_quant_storage='uint8')
2024-07-15 08:08:35 - INFO - __main__ - Data parameters DataArguments(chat_template=None, text_column='text', preprocessing_num_workers=12, truncation_side=None, auto_insert_empty_system_msg=True, train_file='data/training_ptt_dpo-include_id-converted.jsonl', validation_file='data/testing_ptt_dpo_include_id-converted.jsonl', cache_dir='cache_dir')
2024-07-15 08:08:35 - INFO - __main__ - Training/evaluation parameters DPOConfig(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
beta=0.01,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
dataset_num_proc=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_dropout=True,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=100,
eval_strategy=steps,
evaluation_strategy=steps,
f_alpha_divergence_coef=1.0,
f_divergence_type=FDivergenceType.REVERSE_KL,
force_use_ref_model=False,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generate_during_eval=False,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant': False},
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_model_revision=main,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
is_encoder_decoder=None,
jit_mode_eval=False,
label_names=None,
label_pad_token_id=-100,
label_smoothing=0,
label_smoothing_factor=0.0,
learning_rate=5e-07,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=info,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=results/DPO/output_dir/runs/Jul15_08-08-35_CKIP-LLM-VM2,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
loss_type=sigmoid,
lr_scheduler_kwargs={},
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_length=1024,
max_prompt_length=512,
max_steps=-1,
max_target_length=None,
metric_for_best_model=None,
model_adapter_name=None,
model_init_kwargs=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1,
optim=adamw_torch,
optim_args=None,
optim_target_modules=None,
output_dir=DPO/CHT-6_20240715-1255,
overwrite_output_dir=False,
padding_value=None,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
precompute_ref_log_probs=False,
prediction_loss_only=False,
push_to_hub=True,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
ref_adapter_name=None,
ref_model_init_kwargs=None,
ref_model_mixup_alpha=0.9,
ref_model_sync_steps=64,
reference_free=False,
remove_unused_columns=False,
report_to=['tensorboard', 'wandb'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
rpo_alpha=None,
run_name=None,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=100,
save_strategy=steps,
save_total_limit=1,
seed=42,
skip_memory_metrics=True,
split_batches=None,
sync_ref_model=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
truncation_mode=keep_end,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.1,
warmup_steps=0,
weight_decay=0.0,
)
DatasetDict({
train: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 1728
})
test: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 91
})
})DatasetDict({
train: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 1728
})
test: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 91
})
})
DatasetDict({
train: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 1728
})
test: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 91
})
})
DatasetDict({
train: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 1728
})
test: Dataset({
features: ['prompt', 'chosen', 'rejected', 'messages'],
num_rows: 91
})
})
2024-07-15 08:08:37 - INFO - __main__ - Decontaminated 0 (0.00%) samples from the training set.
2024-07-15 08:08:37 - INFO - __main__ - Prompt sample 1309 of the raw training set:
<s>Human:
文章:輿情作者,日期,輿情內文
fspt01,6/20,1.近期晚上4G行動網路常常中斷,從110/05/28開始,特別開始記錄4G網路品質。 我也不是沒有先跟中華電信客服人員反應遇到的障礙,但畢竟是跟固定設備投資 有關,客服人員、機房人員只能敷衍了事。 2.一直以來,很少關心公民議題,這次被搞到一肚子火,認真思考有那些機構,可以 讓我們老百姓反應中華電信壟斷性議題。 網路上看到這則, http://www.ftc.gov.tw/uploadDecision/dd3d922b-bc37-4911-8141-b42779405ab4.pdf 我是看不太懂,大概了解「公平交易委員會」處分「中華電信股份有限公司」新台幣 500萬元罰鍰。深深感到,這或許是一條路。 3.因為我不懂很多東西,希望各個專業領域版友能提供一些資訊。
ebbf,06/20 23:53,有人壓著你只能選種花?
bcd21086999,06/21 00:22,噓
cv17,06/21 00:24,???
obuibo2000,06/21 00:26,任何室內網路不良的,我只推固網.
obuibo2000,06/21 00:26,要賭便宜有便宜的作法,地點好或是挑業者。
ufoubjlbotp,06/21 00:36,人家罰固網廣告不實,與壟斷何干
kbto4560,06/21 00:38,中華電信問題:使用者問題
kbto4560,06/21 00:38,XXX電信問題:誰叫你要用XXX
djuzqpsu,06/21 00:41,光世代的廣告跟4G訊號有啥關係???
gvlbp21,06/21 01:02,呵...為何不攜碼出去?
qbvm40807,06/21 01:10,真的爛要求無條件解約就好了 台灣有5間電信 比中國
qbvm40807,06/21 01:10,還多欸 更不用說背後也都是獨立的大財團 說壟斷? 你
qbvm40807,06/21 01:10,是認真的嗎???
lojhiu326,06/21 01:32,扯到壟斷就太過了XD 當其他電信塑膠?
lplpllp416,06/21 01:38,去申請別家試用看看,哪家情況好就跳哪家,你這種
lplpllp416,06/21 01:38,情況下為何執著於同一家…
sa759,06/21 01:50,壟...壟斷4G?
TvohIzvo,06/21 02:04,推!權益是爭取來的!
Bovunjbp,06/21 02:46,不是湊足關鍵字就能遮掩你是想公審中華電信的奧客
zffpvp,06/21 03:02,中華4G真的爛 標5G亂喊價 把4G用戶拖下水
bsjbeof,06/21 04:39,也許你家是其他業者黃金地段 快跳槽吧 這沒壟斷 XD
bsjbeof,06/21 04:43,種花也許在你家地段受到其他抗議而放棄但不構成壟斷
nbocpx77,06/21 06:26,你先翻開合約好好看一下 就不會想去浪費這個時間了
Epsffo1010,06/21 06:55,.........
ay27771531,06/21 07:20,所以你要解釋那裡壟斷了嗎
Gn4o,06/21 07:28,……
ex7931425,06/21 07:58,中華電信壟斷?
h6i630,06/21 09:26,你不是只有中華可選,何來壟斷=.=?
hbnfljoh,06/21 10:19,不懂的東西你可以查清楚了在上來發文 顆顆
tpwb0809,06/21 11:17,測智商的發文
SD8377,06/21 11:50,什麼啦
NfohYjbo,06/21 11:56,中華4g哪有壟斷,原po是不是搞錯什麼了
kvtujo332805,06/21 12:07,過了超過12小時沒回應
kvtujo332805,06/21 12:07,…感覺又是一篇射後不理文了
ol950357,06/21 12:09,???????????壟斷在那裡???
tpoza700,06/21 12:19,?
cfopbjdp,06/21 12:33,台灣鯛好吃
bqqmfxbsn,06/21 14:19,呃呃呃,該從哪裡開始吐槽
ibolthp2012,06/21 15:39,這叫公民議題?
nfqbmjoh,06/21 16:12,請先理解甚麼是行動網路跟固定網路好嗎 別的先不說
nfqbmjoh,06/21 16:12,拿光世代的資料比行動網路???????
tijofMpwf,06/21 18:41,這是公民議題?
dbnfmmbmb,06/22 03:05,…
djnpof : 認真回,06/22 10:02,能歸責於種花的大概只有網路太多人使用以致
djnpof,06/22 10:05,網路中斷 其他很難要歸責種花 (干擾或訊號不佳等)
djnpof,06/22 10:08,曾有聽過有人因鄰居晚間使用電器而造成網路中斷的
djnpof,06/22 10:09,像這種狀況的 是種花的問題嗎?
bmfwfmnbo,06/22 11:25,疫情開始後,網路確實慢很多,共體時艱吧
===
問題:請對以上討論進行所有意見的分類與摘要,類別有六種: 建議、質疑、疑問、抱怨、讚美、提供資訊。意見摘要用條列的方式呈現,每條意見之後用括號包含持此意見的發>言者,原本文章作者的意見也必須一併分析,同一條意見可能包含不只一人。
2024-07-15 08:08:37 - INFO - __main__ - Chosen sample 1309 of the raw training set:
Assistant:
建議:
- 建議可以換其他電信業者,並試試其他業者的服務品質如何 (ebbf,obuibo2000,lplpllp416,bsjbeof)
- 建議對方先翻開並理解合約內容,便不會浪費時間 (nbocpx77)
- 固定網路比較不會出現問題,不應該使用行動網路來取代 (obuibo2000)
- 如果真的覺得爛到無法忍受,可以請求無條件解約 (qbvm40807)
質疑:
- 中華電信的問題被視為用戶問題,其他電信的問題則是因為選擇該電信業者 (kbto4560)
- 中華電信4G和5G的問題和光世代的廣告無關 (djuzqpsu)
- 對於原po以壟斷來批評中華電信的用詞有疑問 (ufoubjlbotp, qbvm40807, lojhiu326, sa759, ex7931425,h6i630,NfohYjbo)
- 懷疑原po是個奧客,想要公審中華電信 (Bovunjbp)
- 懷疑原po沒有好好理解公平交易法和壟斷的意義 (hbnfljoh)
- 懷疑原po發完文後就沒有回應,可能是在故意搞事 (kvtujo332805)
疑問:
- 原po所說的壟斷是指什麼?為何認為中華電信壟斷了行動網路市場?(ay27771531, lojhiu326,NfohYjbo,ol950357,h6i630)
- 不理解原po為什麼堅持使用同一家信號不好的電信業者而不換家試試 (lplpllp416,Gn4o)
抱怨:
- 抱怨中華電信4G的不穩定,標5G價格卻又拖累4G用戶 (fspt01, zffpvp)
讚美:
- 原po對公民議題的態度值得肯定。 (TvohIzvo)
提供資訊:
- 台灣有五家電信業者,背後都有大財團支援,沒有壟斷的問題 (qbvm40807)
- 台灣的網路使用量在疫情期間增加,速度變慢是共體時艱 (bmfwfmnbo)
- 網路使用時段、設備種類、位置等因素都可能影響網路品質,並非完全由電信業者控制 (djnpof)</s>
2024-07-15 08:08:37 - INFO - __main__ - Rejected sample 1309 of the raw training set:
Assistant:
建議:
1. 應該讓老百姓反應中華電信壟斷性議題 (fspt01)
2. 應該攜碼出去 (gvlbp21)
3. 應該去申請別家試用看看,哪家情況好就跳哪家 (lplpllp416)
4. 應該先翻開合約好好看一下,就不會想去浪費這個時間了 (nbocpx77)
質疑:
1. 質疑中華電信是否真的壟斷4G (sa759, a7931425, h6i630, hbnfljoh, ol950357, tpoza700, cfopbjdp, bqqmfxbsn, ibolthp2012, nfqbmjoh, tijofMpwf, dbnfmmbmb)
2. 質疑種花是否構成壟斷 (bsjbeof)
疑問:
1. 疑問公民議題是什麼 (fspt01, ibolthp2012, nfqbmjoh, tijofMpwf)
2. 疑問壟斷在哪裡 (ay27771531, Gn4o, ex7931425, ol950357, tpoza700, cfopbjdp, bqqmfxbsn, ibolthp2012, nfqbmjoh, tijofMpwf, dbnfmmbmb)
抱怨:
1. 抱怨4G網路品質不好 (fspt01)
2. 抱怨中華電信壟斷 (fspt01, kbto4560, zffpvp, bsjbeof)
3. 抱怨網路中斷 (nfqbmjoh)
讚美:
1. 讚美固網 (obuibo2000)
2. 讚美公平交易委員會處分中華電信的行為 (fspt01)
提供資訊:
1. 提供了公平交易委員會處分中華電信的資訊 (fspt01)
2. 提供了中華電信被罰款的資訊 (ebbf, cv17, obuibo2000, ufoubjlbotp, kbto4560, djuzqpsu, gvlbp21, qbvm40807, lojhiu326, lplpllp416, sa759, TvohIzvo, Bovunjbp, zffpvp, bsjbeof, nbocpx77, Epsffo1010, ay27771531, Gn4o, ex7931425, h6i630, hbnfljoh, ol950357, tpoza700, cfopbjdp, bqqmfxbsn, ibolthp2012, nfqbmjoh, tijofMpwf, dbnfmmbmb)</s>
2024-07-15 08:08:37 - INFO - __main__ - Prompt sample 228 of the raw training set:
<s>Human:
文章:輿情作者,日期,輿情內文
bcdtbfhvtb11,6/25,照上面說的1199以下等於是5G的量到降速吃到飽,1399開始才是真的5G吃到飽 現在4G吃到飽我家靠窗速度還有100M上下,晚上有時速度會慢一點不到10M 今天看到5G方案讓我猶豫要不要升級5G了 有2個問題 1.50G的額度用USB分享給電腦也算嗎?還是開Wifi熱點才是? 2.如果用1399是熱點超過50G連本身吃到飽也降速到10M?還是只有被分享的裝置會降速到10M?
qpqpqbm,06/25 11:22,熱點分享量到是鎖卡 只剩手機能用吧 目前看起來是
qpqpqbm,06/25 11:22,這樣玩 不夠這也是中華董事長有說目前涵蓋不過才有
qpqpqbm,06/25 11:22,限制 未來就不知道了
zjfstbo,06/25 11:31,限制一堆價錢這樣 以為消費者會買單嗎 天真
qpqpqbm,06/25 11:32,市場跟客戶需求有關 客戶有需求就要滿足XD
qpqpqbm,06/25 11:33,https://reurl.cc/L3zKMe 高資費1399以上用的爽就好
qpqpqbm,06/25 11:34,不夠用早該辦固網滿足辦公家庭需求...
uudnm,06/25 12:07,我也好奇,熱點流量用完,手機的吃到飽應該不會限速
uudnm,06/25 12:07,吧
qpqpqbm,06/25 12:11,手機沒影響 就是要擋大戶 分享流量夠一般基本用途
olit9412235,06/25 14:51,為什麼拿台北市跟東京比,因為都是大都市,比較對
olit9412235,06/25 14:51,象是 docomo 跟中華電信,也是兩邊電信龍頭,這是
olit9412235,06/25 14:51,親身體驗。這還是手機而已,如果是租分享器速度會
olit9412235,06/25 14:51,更快。
lbnjs,06/25 15:18,額度是全部,包含使用SIM卡網路的所有裝置。
gmjfstlz,06/25 21:14,容量哪有在分你開分享與否?有網路都是用5g跑阿
===
問題:請對以上討論進行所有意見的分類與摘要,類別有六種: 建議、質疑、疑問、抱怨、讚美、提供資訊。意見摘要用條列的方式呈現,每條意見之後用括號包含持此意見的發>言者,原本文章作者的意見也必須一併分析,同一條意見可能包含不只一人。
2024-07-15 08:08:37 - INFO - __main__ - Chosen sample 228 of the raw training set:
Assistant:
建議:
1. 可以考慮升級5G(bcdtbfhvtb11)
2. 需求不夠建議可以辦固網滿足辦公家庭需求(qpqpqbm)
質疑:
1. 對於新的5G方案有所疑慮,質疑是否值得升級(bcdtbfhvtb11)
2. 質疑行動網路公司將價格定得過高,以及設置太多限制(zjfstbo)
疑問:
1. 50G的額度是否包含用USB分享給電腦的數據(bcdtbfhvtb11)
2. 如果用1399是熱點超過50G連本身吃到飽也降速到10M(bcdtbfhvtb11)
3. 對於熱點流量用完後,手機的吃到飽是否會限速(uudnm)
抱怨:
1. 對於5G資費計劃有許多限制,價格又偏高感到不滿(zjfstbo)
讚美:
1. 沒有讚美的意見
提供資訊:
1. 分享數據到其他裝置也會計算在內(lbnjs, gmjfstlz)
2. 若熱點流量用完,手機所使用的數據還是5G數據並不受限制(uudnm, qpqpqbm)
3. 對於市場需求關係的說明(qpqpqbm)
4. 台北市與東京大都市的比較(olit9412235)
5. 提供相關的網路連結(qpqpqbm)</s>
2024-07-15 08:08:37 - INFO - __main__ - Rejected sample 228 of the raw training set:
Assistant:
建議:
1. 高資費1399以上用的爽就好 (qpqpqbm)
2. 不夠用早該辦固網滿足辦公家庭需求 (qpqpqbm)
質疑:
1. 限制一堆價錢這樣 以為消費者會買單嗎 天真 (zjfstbo)
疑問:
1. 1.50G的額度用USB分享給電腦也算嗎?還是開Wifi熱點才是? (bcdtbfhvtb11)
2. 如果用1399是熱點超過50G連本身吃到飽也降速到10M?還是只有被分享的裝置會降速到10M? (bcdtbfhvtb11)
抱怨:
1. 市場跟客戶需求有關 客戶有需求就要滿足XD (qpqpqbm)
讚美:
無
提供資訊:
1. 熱點分享量到是鎖卡 只剩手機能用吧 目前看起來是這樣玩 不夠這也是中華董事長有說目前涵蓋不過才有限制 未來就不知道了 (qpqpqbm)
2. 手機沒影響 就是要擋大戶 分享流量夠一般基本用途 (qpqpqbm)
3. 額度是全部,包含使用SIM卡網路的所有裝置 (lbnjs)
4. 容量哪有在分你開分享與否?有網路都是用5g跑阿 (gmjfstlz)
5. 為什麼拿台北市跟東京比,因為都是大都市,比較對象是 docomo 跟中華電信,也是兩邊電信龍頭,這是親身體驗 (olit9412235)</s>
2024-07-15 08:08:37 - INFO - __main__ - Prompt sample 51 of the raw training set:
<s>Human:
文章:輿情作者,日期,輿情內文
epodmjboh,4/22,大家好!! 小弟這次剛好符合中華的公教雙飽資格, 想衝一波, 但是只想辦個299的就好, 因為平常都是用預付卡買1G, 不過有看到299裡面, 有個優惠是群內優惠 免費, 想問這個 是要需要多加50加電話號碼進群組嗎? 還是就是299就可以直接設定電話進群組 對這個有點不懂,想問有無人可以解答 謝謝大大們
kbnft732,04/22 18:29,你要不要考慮直接問客服?
qvtiqvmm,04/22 18:34,好像不用加50。 剛好今天辦299有問。
yv3,04/22 18:44,台哥的群內免費試 每個單位有一個編號 例如台北市政
yv3,04/22 18:45,府教育局 編號是1234 那教育局的員工辦公教專案
===
問題:請對以上討論進行所有意見的分類與摘要,類別有六種: 建議、質疑、疑問、抱怨、讚美、提供資訊。意見摘要用條列的方式呈現,每條意見之後用括號包含持此意見的發>言者,原本文章作者的意見也必須一併分析,同一條意見可能包含不只一人。
2024-07-15 08:08:37 - INFO - __main__ - Chosen sample 51 of the raw training set:
Assistant:
- 疑問:想要了解299元的方案是不是可以直接加入群組,還是要多加50元,並希望有人能提供資訊 (epodmjboh)
- 建議:直接詢問客服以獲得最確實資訊 (kbnft732)
- 提供資訊:今天辦理299元專案,表示不需要加50元 (qvtiqvmm)
- 提供資訊:解釋群內免費的方式,每個單位組會有一個編號 (yv3)</s>
2024-07-15 08:08:37 - INFO - __main__ - Rejected sample 51 of the raw training set:
Assistant:
- 疑問:
- 想問這個 是要需要多加50加電話號碼進群組嗎? 還是就是299就可以直接設定電話進群組 (epodmjboh)
- 想問有無人可以解答 (epodmjboh)
- 建議:
- 你要不要考慮直接問客服? (kbnft732)
- 提供資訊:
- 好像不用加50。 剛好今天辦299有問。 (qvtiqvmm)
- 台哥的群內免費試 每個單位有一個編號 例如台北市政府教育局 編號是1234 那教育局的員工辦公教專案 (yv3)</s>
[INFO 20240712-1747] init DPOTrainer
[INFO 20240712-1747] init DPOTrainer[INFO 20240712-1747] init DPOTrainer
[INFO 20240712-1747] init DPOTrainer
[2024-07-15 08:08:45,260] [INFO] [partition_parameters.py:347:__exit__] finished initializing model - num_params = 291, num_elems = 7.24B
[2024-07-15 08:11:54,287] [INFO] [partition_parameters.py:347:__exit__] finished initializing model - num_params = 582, num_elems = 14.48B
2024-07-15 08:11:58 - WARNING - datasets.fingerprint - Parameter 'function'=<bound method DPOTrainer.tokenize_row of <trl.trainer.dpo_trainer.DPOTrainer object at 0x7f79c52c4340>> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
2024-07-15 08:12:07 - WARNING - datasets.fingerprint - Parameter 'function'=<bound method DPOTrainer.tokenize_row of <trl.trainer.dpo_trainer.DPOTrainer object at 0x7f9341050370>> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
2024-07-15 08:12:07 - WARNING - datasets.fingerprint - Parameter 'function'=<bound method DPOTrainer.tokenize_row of <trl.trainer.dpo_trainer.DPOTrainer object at 0x7efed65856c0>> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
2024-07-15 08:12:07 - WARNING - datasets.fingerprint - Parameter 'function'=<bound method DPOTrainer.tokenize_row of <trl.trainer.dpo_trainer.DPOTrainer object at 0x7f62a1e81bd0>> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
[2024-07-15 08:12:09,915] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.2, git-hash=unknown, git-branch=unknown
[2024-07-15 08:12:09,928] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-07-15 08:12:09,929] [INFO] [logging.py:96:log_dist] [Rank 0] Creating ZeRO Offload
[2024-07-15 08:12:10,036] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-07-15 08:12:10,037] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 7.42 GB CA 7.89 GB Max_CA 8 GB
[2024-07-15 08:12:10,037] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 17.32 GB, percent = 2.0%
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2024-07-15 08:12:10,132] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-07-15 08:12:10,132] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 6.74 GB CA 7.89 GB Max_CA 8 GB
[2024-07-15 08:12:10,133] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 17.34 GB, percent = 2.0%
[2024-07-15 08:12:10,133] [INFO] [config.py:972:print] DeepSpeedEngine configuration:
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] amp_enabled .................. False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] amp_params ................... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] bfloat16_enabled ............. True
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f79c5f67a00>
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] communication_data_type ...... None
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] curriculum_params_legacy ..... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] data_efficiency_enabled ...... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] dataloader_drop_last ......... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] disable_allgather ............ False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] dump_state ................... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_enabled ........... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] eigenvalue_verbose ........... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] elasticity_enabled ........... False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] fp16_auto_cast ............... None
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] fp16_enabled ................. False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] global_rank .................. 0
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] grad_accum_dtype ............. None
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] gradient_accumulation_steps .. 2
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] gradient_clipping ............ 1.0
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0
[2024-07-15 08:12:10,134] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] load_universal_checkpoint .... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] loss_scale ................... 1.0
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] memory_breakdown ............. False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] mics_hierarchial_params_gather False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] mics_shard_size .............. -1
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] optimizer_name ............... None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] optimizer_params ............. None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] pld_enabled .................. False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] pld_params ................... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] prescale_gradients ........... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] scheduler_name ............... None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] scheduler_params ............. None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] sparse_attention ............. None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] steps_per_print .............. inf
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] train_batch_size ............. 64
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 8
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] use_node_local_storage ....... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] wall_clock_breakdown ......... False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] weight_quantization_config ... None
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] world_size ................... 4
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] zero_allow_untested_optimizer False
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] zero_enabled ................. True
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True
[2024-07-15 08:12:10,135] [INFO] [config.py:976:print] zero_optimization_stage ...... 3
[2024-07-15 08:12:10,135] [INFO] [config.py:962:print_user_config] json = {
"train_batch_size": 64,
"train_micro_batch_size_per_gpu": 8,
"gradient_accumulation_steps": 2,
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none",
"nvme_path": null
},
"offload_param": {
"device": "none",
"nvme_path": null
},
"stage3_gather_16bit_weights_on_model_save": true
},
"gradient_clipping": 1.0,
"steps_per_print": inf,
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false
},
"zero_optimization.reduce_bucket_size": 1.677722e+07,
"zero_optimization.stage3_param_persistence_threshold": 4.096000e+04,
"zero_optimization.stage3_prefetch_bucket_size": 1.509949e+07
}
[INFO 20240712-1747] ☆☆☆
[DEBUG 20240712-1804] checkpoint = None
[INFO 20240712-1747] ☆☆☆
[DEBUG 20240712-1804] checkpoint = None
[INFO 20240712-1747] ☆☆☆
[DEBUG 20240712-1804] checkpoint = None
[INFO 20240712-1747] ☆☆☆
[DEBUG 20240712-1804] checkpoint = None
[2024-07-15 08:12:35,789] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.2, git-hash=unknown, git-branch=unknown
[2024-07-15 08:12:35,796] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-07-15 08:12:35,797] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-07-15 08:12:35,797] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-07-15 08:12:35,808] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-07-15 08:12:35,808] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2024-07-15 08:12:35,808] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2024-07-15 08:12:35,808] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[2024-07-15 08:12:35,927] [INFO] [utils.py:802:see_memory_usage] Stage 3 initialize beginning
[2024-07-15 08:12:35,927] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 6.74 GB CA 6.94 GB Max_CA 8 GB
[2024-07-15 08:12:35,928] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.62 GB, percent = 2.1%
[2024-07-15 08:12:35,929] [INFO] [stage3.py:126:__init__] Reduce bucket size 500,000,000
[2024-07-15 08:12:35,929] [INFO] [stage3.py:127:__init__] Prefetch bucket size 50,000,000
[2024-07-15 08:12:36,027] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-07-15 08:12:36,028] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 6.74 GB CA 6.94 GB Max_CA 7 GB
[2024-07-15 08:12:36,028] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.62 GB, percent = 2.1%
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2024-07-15 08:12:36,149] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-07-15 08:12:36,150] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 6.74 GB CA 6.94 GB Max_CA 7 GB
[2024-07-15 08:12:36,150] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.62 GB, percent = 2.1%
[2024-07-15 08:12:36,268] [INFO] [utils.py:802:see_memory_usage] Before creating fp16 partitions
[2024-07-15 08:12:36,269] [INFO] [utils.py:803:see_memory_usage] MA 6.74 GB Max_MA 6.74 GB CA 6.94 GB Max_CA 7 GB
[2024-07-15 08:12:36,269] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.62 GB, percent = 2.1%
[2024-07-15 08:12:37,788] [INFO] [utils.py:802:see_memory_usage] After creating fp16 partitions: 2
[2024-07-15 08:12:37,789] [INFO] [utils.py:803:see_memory_usage] MA 6.75 GB Max_MA 6.75 GB CA 7.28 GB Max_CA 7 GB
[2024-07-15 08:12:37,789] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:37,890] [INFO] [utils.py:802:see_memory_usage] Before creating fp32 partitions
[2024-07-15 08:12:37,890] [INFO] [utils.py:803:see_memory_usage] MA 6.75 GB Max_MA 6.75 GB CA 7.28 GB Max_CA 7 GB
[2024-07-15 08:12:37,891] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:37,998] [INFO] [utils.py:802:see_memory_usage] After creating fp32 partitions
[2024-07-15 08:12:37,999] [INFO] [utils.py:803:see_memory_usage] MA 13.49 GB Max_MA 14.97 GB CA 15.92 GB Max_CA 16 GB
[2024-07-15 08:12:37,999] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:38,100] [INFO] [utils.py:802:see_memory_usage] Before initializing optimizer states
[2024-07-15 08:12:38,101] [INFO] [utils.py:803:see_memory_usage] MA 13.49 GB Max_MA 13.49 GB CA 15.92 GB Max_CA 16 GB
[2024-07-15 08:12:38,101] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:38,267] [INFO] [utils.py:802:see_memory_usage] After initializing optimizer states
[2024-07-15 08:12:38,267] [INFO] [utils.py:803:see_memory_usage] MA 26.98 GB Max_MA 33.73 GB CA 36.97 GB Max_CA 37 GB
[2024-07-15 08:12:38,268] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:38,268] [INFO] [stage3.py:460:_setup_for_real_optimizer] optimizer state initialized
[2024-07-15 08:12:38,576] [INFO] [utils.py:802:see_memory_usage] After initializing ZeRO optimizer
[2024-07-15 08:12:38,576] [INFO] [utils.py:803:see_memory_usage] MA 31.29 GB Max_MA 31.77 GB CA 51.01 GB Max_CA 51 GB
[2024-07-15 08:12:38,577] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 18.64 GB, percent = 2.2%
[2024-07-15 08:12:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2024-07-15 08:12:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-07-15 08:12:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-07-15 08:12:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]
[2024-07-15 08:12:38,578] [INFO] [config.py:972:print] DeepSpeedEngine configuration:
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] amp_enabled .................. False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] amp_params ................... False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] bfloat16_enabled ............. True
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f7990a07520>
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] communication_data_type ...... None
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] curriculum_params_legacy ..... False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] data_efficiency_enabled ...... False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] dataloader_drop_last ......... False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] disable_allgather ............ False
[2024-07-15 08:12:38,578] [INFO] [config.py:976:print] dump_state ................... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_enabled ........... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] eigenvalue_verbose ........... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] elasticity_enabled ........... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] fp16_auto_cast ............... None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] fp16_enabled ................. False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] global_rank .................. 0
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] grad_accum_dtype ............. None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] gradient_accumulation_steps .. 2
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] gradient_clipping ............ 1.0
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] load_universal_checkpoint .... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] loss_scale ................... 1.0
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] memory_breakdown ............. False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] mics_hierarchial_params_gather False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] mics_shard_size .............. -1
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] optimizer_name ............... None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] optimizer_params ............. None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] pld_enabled .................. False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] pld_params ................... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] prescale_gradients ........... False
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] scheduler_name ............... None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] scheduler_params ............. None
[2024-07-15 08:12:38,579] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] sparse_attention ............. None
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] steps_per_print .............. inf
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] train_batch_size ............. 64
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 8
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] use_node_local_storage ....... False
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] wall_clock_breakdown ......... False
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] weight_quantization_config ... None
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] world_size ................... 4
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] zero_allow_untested_optimizer True
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] zero_enabled ................. True
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True
[2024-07-15 08:12:38,580] [INFO] [config.py:976:print] zero_optimization_stage ...... 3
[2024-07-15 08:12:38,580] [INFO] [config.py:962:print_user_config] json = {
"train_batch_size": 64,
"train_micro_batch_size_per_gpu": 8,
"gradient_accumulation_steps": 2,
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none",
"nvme_path": null
},
"offload_param": {
"device": "none",
"nvme_path": null
},
"stage3_gather_16bit_weights_on_model_save": true
},
"gradient_clipping": 1.0,
"steps_per_print": inf,
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false
},
"zero_allow_untested_optimizer": true
}
[2024-07-15 08:13:10,011] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
{'loss': 0.6931, 'grad_norm': 21.960294652093665, 'learning_rate': 1.6666666666666665e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -430.01824951171875, 'logps/chosen': -522.2332763671875, 'logits/rejected': -1.5321521759033203, 'logits/chosen': -1.5251624584197998, 'epoch': 0.04}
{'loss': 0.6931, 'grad_norm': 20.29259471509865, 'learning_rate': 3.333333333333333e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -407.6622314453125, 'logps/chosen': -442.01617431640625, 'logits/rejected': -1.5653201341629028, 'logits/chosen': -1.58231520652771, 'epoch': 0.07}
{'loss': 0.6925, 'grad_norm': 21.124662560316178, 'learning_rate': 5e-07, 'rewards/chosen': -0.0025113581214100122, 'rewards/rejected': -0.000655155279673636, 'rewards/accuracies': 0.3125, 'rewards/margins': -0.0018562028417363763, 'logps/rejected': -416.7310485839844, 'logps/chosen': -491.6346740722656, 'logits/rejected': -1.5683329105377197, 'logits/chosen': -1.5681908130645752, 'epoch': 0.11}
{'loss': 0.6914, 'grad_norm': 21.322323109359544, 'learning_rate': 4.978612153434526e-07, 'rewards/chosen': 0.000672187830787152, 'rewards/rejected': -0.0027900219429284334, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.0034622098319232464, 'logps/rejected': -379.40631103515625, 'logps/chosen': -509.2655029296875, 'logits/rejected': -1.6126008033752441, 'logits/chosen': -1.537656307220459, 'epoch': 0.15}
{'loss': 0.6851, 'grad_norm': 20.49472730639404, 'learning_rate': 4.91481456572267e-07, 'rewards/chosen': 0.003443555673584342, 'rewards/rejected': -0.011739811860024929, 'rewards/accuracies': 0.875, 'rewards/margins': 0.01518336683511734, 'logps/rejected': -403.841796875, 'logps/chosen': -488.357421875, 'logits/rejected': -1.5996417999267578, 'logits/chosen': -1.5487549304962158, 'epoch': 0.19}
{'loss': 0.6694, 'grad_norm': 21.062141100447082, 'learning_rate': 4.809698831278217e-07, 'rewards/chosen': 0.005675068125128746, 'rewards/rejected': -0.03574930876493454, 'rewards/accuracies': 1.0, 'rewards/margins': 0.041424378752708435, 'logps/rejected': -404.27301025390625, 'logps/chosen': -471.77337646484375, 'logits/rejected': -1.543208360671997, 'logits/chosen': -1.4875125885009766, 'epoch': 0.22}
{'loss': 0.6585, 'grad_norm': 19.360093428533585, 'learning_rate': 4.6650635094610966e-07, 'rewards/chosen': 0.003629775019362569, 'rewards/rejected': -0.057777851819992065, 'rewards/accuracies': 1.0, 'rewards/margins': 0.061407629400491714, 'logps/rejected': -342.85699462890625, 'logps/chosen': -442.50823974609375, 'logits/rejected': -1.5954453945159912, 'logits/chosen': -1.5651326179504395, 'epoch': 0.26}
{'loss': 0.6262, 'grad_norm': 20.593014111238194, 'learning_rate': 4.483383350728088e-07, 'rewards/chosen': 0.012023219838738441, 'rewards/rejected': -0.1112871766090393, 'rewards/accuracies': 0.9375, 'rewards/margins': 0.123310387134552, 'logps/rejected': -454.96221923828125, 'logps/chosen': -557.0953979492188, 'logits/rejected': -1.4964159727096558, 'logits/chosen': -1.4438291788101196, 'epoch': 0.3}
{'loss': 0.61, 'grad_norm': 18.97598534624882, 'learning_rate': 4.2677669529663686e-07, 'rewards/chosen': 0.005186537280678749, 'rewards/rejected': -0.20743262767791748, 'rewards/accuracies': 1.0, 'rewards/margins': 0.21261915564537048, 'logps/rejected': -464.10333251953125, 'logps/chosen': -545.1387939453125, 'logits/rejected': -1.501185417175293, 'logits/chosen': -1.4766960144042969, 'epoch': 0.33}
{'loss': 0.5895, 'grad_norm': 18.27743651838518, 'learning_rate': 4.0219035725218013e-07, 'rewards/chosen': 0.0059821512550115585, 'rewards/rejected': -0.2528378963470459, 'rewards/accuracies': 1.0, 'rewards/margins': 0.2588200271129608, 'logps/rejected': -449.51141357421875, 'logps/chosen': -547.5632934570312, 'logits/rejected': -1.575829029083252, 'logits/chosen': -1.577054500579834, 'epoch': 0.37}
{'loss': 0.5797, 'grad_norm': 17.63288173591455, 'learning_rate': 3.75e-07, 'rewards/chosen': -0.056207820773124695, 'rewards/rejected': -0.24953657388687134, 'rewards/accuracies': 1.0, 'rewards/margins': 0.19332876801490784, 'logps/rejected': -381.8707275390625, 'logps/chosen': -413.1561279296875, 'logits/rejected': -1.5630258321762085, 'logits/chosen': -1.528336524963379, 'epoch': 0.41}
{'loss': 0.5321, 'grad_norm': 19.011135172725975, 'learning_rate': 3.4567085809127245e-07, 'rewards/chosen': -0.09093473851680756, 'rewards/rejected': -0.410855233669281, 'rewards/accuracies': 0.9375, 'rewards/margins': 0.31992048025131226, 'logps/rejected': -390.5659484863281, 'logps/chosen': -429.57568359375, 'logits/rejected': -1.5054963827133179, 'logits/chosen': -1.518822431564331, 'epoch': 0.44}
{'loss': 0.4824, 'grad_norm': 19.115556004222547, 'learning_rate': 3.147047612756302e-07, 'rewards/chosen': -0.12970858812332153, 'rewards/rejected': -0.6150213479995728, 'rewards/accuracies': 1.0, 'rewards/margins': 0.4853127896785736, 'logps/rejected': -425.39727783203125, 'logps/chosen': -497.06396484375, 'logits/rejected': -1.4986257553100586, 'logits/chosen': -1.4965465068817139, 'epoch': 0.48}
{'loss': 0.4697, 'grad_norm': 19.291540876536054, 'learning_rate': 2.826315480550129e-07, 'rewards/chosen': -0.246780663728714, 'rewards/rejected': -0.7276896238327026, 'rewards/accuracies': 1.0, 'rewards/margins': 0.48090896010398865, 'logps/rejected': -438.9479675292969, 'logps/chosen': -486.5482482910156, 'logits/rejected': -1.4534223079681396, 'logits/chosen': -1.4486994743347168, 'epoch': 0.52}
{'loss': 0.4387, 'grad_norm': 18.729411908069906, 'learning_rate': 2.5e-07, 'rewards/chosen': -0.365586519241333, 'rewards/rejected': -0.9307430982589722, 'rewards/accuracies': 1.0, 'rewards/margins': 0.5651566386222839, 'logps/rejected': -494.84197998046875, 'logps/chosen': -521.0465698242188, 'logits/rejected': -1.5101096630096436, 'logits/chosen': -1.5300798416137695, 'epoch': 0.56}
{'loss': 0.4176, 'grad_norm': 18.803005444933415, 'learning_rate': 2.1736845194498716e-07, 'rewards/chosen': -0.517939567565918, 'rewards/rejected': -1.0383610725402832, 'rewards/accuracies': 0.875, 'rewards/margins': 0.5204216241836548, 'logps/rejected': -369.7915344238281, 'logps/chosen': -459.795654296875, 'logits/rejected': -1.48337721824646, 'logits/chosen': -1.4446580410003662, 'epoch': 0.59}
{'loss': 0.403, 'grad_norm': 16.74794343891432, 'learning_rate': 1.8529523872436977e-07, 'rewards/chosen': -0.6889065504074097, 'rewards/rejected': -1.3369628190994263, 'rewards/accuracies': 1.0, 'rewards/margins': 0.6480563282966614, 'logps/rejected': -611.8056640625, 'logps/chosen': -628.4656372070312, 'logits/rejected': -1.4669878482818604, 'logits/chosen': -1.4267468452453613, 'epoch': 0.63}
{'loss': 0.4156, 'grad_norm': 16.666326698051094, 'learning_rate': 1.5432914190872756e-07, 'rewards/chosen': -0.7545160055160522, 'rewards/rejected': -1.355210304260254, 'rewards/accuracies': 0.875, 'rewards/margins': 0.6006942391395569, 'logps/rejected': -625.54248046875, 'logps/chosen': -662.2476806640625, 'logits/rejected': -1.4642560482025146, 'logits/chosen': -1.4678699970245361, 'epoch': 0.67}
{'loss': 0.3557, 'grad_norm': 20.816617967599026, 'learning_rate': 1.2500000000000005e-07, 'rewards/chosen': -0.7387192249298096, 'rewards/rejected': -1.7132151126861572, 'rewards/accuracies': 1.0, 'rewards/margins': 0.9744957685470581, 'logps/rejected': -571.2609252929688, 'logps/chosen': -588.0455322265625, 'logits/rejected': -1.5267467498779297, 'logits/chosen': -1.5228757858276367, 'epoch': 0.7}
{'loss': 0.3822, 'grad_norm': 15.44110172622846, 'learning_rate': 9.780964274781983e-08, 'rewards/chosen': -0.7607905268669128, 'rewards/rejected': -1.6602680683135986, 'rewards/accuracies': 1.0, 'rewards/margins': 0.899477481842041, 'logps/rejected': -557.4992065429688, 'logps/chosen': -621.6253051757812, 'logits/rejected': -1.3522812128067017, 'logits/chosen': -1.3689507246017456, 'epoch': 0.74}
{'loss': 0.3529, 'grad_norm': 15.868660349315958, 'learning_rate': 7.322330470336313e-08, 'rewards/chosen': -0.9673899412155151, 'rewards/rejected': -1.7681092023849487, 'rewards/accuracies': 1.0, 'rewards/margins': 0.8007193207740784, 'logps/rejected': -677.4046630859375, 'logps/chosen': -700.6104125976562, 'logits/rejected': -1.4624991416931152, 'logits/chosen': -1.4828577041625977, 'epoch': 0.78}
{'loss': 0.3545, 'grad_norm': 15.694101584565033, 'learning_rate': 5.166166492719124e-08, 'rewards/chosen': -0.9602385759353638, 'rewards/rejected': -1.9203271865844727, 'rewards/accuracies': 1.0, 'rewards/margins': 0.9600885510444641, 'logps/rejected': -537.06787109375, 'logps/chosen': -577.3356323242188, 'logits/rejected': -1.4121013879776, 'logits/chosen': -1.4382765293121338, 'epoch': 0.81}
{'loss': 0.3302, 'grad_norm': 14.540270530925572, 'learning_rate': 3.349364905389032e-08, 'rewards/chosen': -0.7483774423599243, 'rewards/rejected': -2.161633253097534, 'rewards/accuracies': 1.0, 'rewards/margins': 1.4132558107376099, 'logps/rejected': -686.2198486328125, 'logps/chosen': -638.3490600585938, 'logits/rejected': -1.3795686960220337, 'logits/chosen': -1.386987566947937, 'epoch': 0.85}
{'loss': 0.3296, 'grad_norm': 17.790158271068087, 'learning_rate': 1.9030116872178314e-08, 'rewards/chosen': -0.8251708149909973, 'rewards/rejected': -1.8419311046600342, 'rewards/accuracies': 1.0, 'rewards/margins': 1.0167603492736816, 'logps/rejected': -591.7145385742188, 'logps/chosen': -697.348388671875, 'logits/rejected': -1.4148858785629272, 'logits/chosen': -1.3577136993408203, 'epoch': 0.89}
{'loss': 0.3282, 'grad_norm': 15.54643458961614, 'learning_rate': 8.518543427732949e-09, 'rewards/chosen': -0.9578792452812195, 'rewards/rejected': -1.9465538263320923, 'rewards/accuracies': 0.9375, 'rewards/margins': 0.988674521446228, 'logps/rejected': -653.150146484375, 'logps/chosen': -599.9559326171875, 'logits/rejected': -1.4120821952819824, 'logits/chosen': -1.4623993635177612, 'epoch': 0.93}
{'loss': 0.3016, 'grad_norm': 16.648883686759966, 'learning_rate': 2.1387846565474044e-09, 'rewards/chosen': -1.085956335067749, 'rewards/rejected': -2.2164981365203857, 'rewards/accuracies': 0.9375, 'rewards/margins': 1.1305416822433472, 'logps/rejected': -615.1410522460938, 'logps/chosen': -623.4697265625, 'logits/rejected': -1.327600359916687, 'logits/chosen': -1.3788025379180908, 'epoch': 0.96}
{'loss': 0.3356, 'grad_norm': 13.437946112135876, 'learning_rate': 0.0, 'rewards/chosen': -0.9807089567184448, 'rewards/rejected': -2.008396625518799, 'rewards/accuracies': 0.9375, 'rewards/margins': 1.0276877880096436, 'logps/rejected': -537.026611328125, 'logps/chosen': -570.8668212890625, 'logits/rejected': -1.4338197708129883, 'logits/chosen': -1.4525063037872314, 'epoch': 1.0}
[2024-07-15 08:20:51,092] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step27 is about to be saved!
[2024-07-15 08:20:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/zero_pp_rank_0_mp_rank_00_model_states.pt
[2024-07-15 08:20:51,122] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/zero_pp_rank_0_mp_rank_00_model_states.pt...
[2024-07-15 08:20:51,199] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/zero_pp_rank_0_mp_rank_00_model_states.pt.
[2024-07-15 08:20:51,269] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-07-15 08:26:24,053] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-07-15 08:26:24,128] [INFO] [engine.py:3393:_save_zero_checkpoint] zero checkpoint saved DPO/CHT-6_20240715-1255/checkpoint-27/global_step27/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-07-15 08:26:31,282] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step27 is ready now!
[INFO 20240712-1747] ☆☆☆☆☆
[INFO 20240712-1747] ☆☆☆☆☆
[INFO 20240712-1747] ☆☆☆☆☆
{'train_runtime': 833.3436, 'train_samples_per_second': 2.074, 'train_steps_per_second': 0.032, 'train_loss': 0.4969749693517332, 'epoch': 1.0}
|