PEFT
PyTorch
Hindi
mixtral
rohansolo's picture
Update README.md
5488eb9
|
raw
history blame
2.49 kB
metadata
library_name: peft
license: cc-by-nc-4.0
base_model: mistralai/Mixtral-8x7B-v0.1
datasets:
  - HuggingFaceH4/ultrachat_200k
  - rohansolo/BB_HindiHinglishV2
model-index:
  - name: BB-Mixtral-HindiHinglish-8x7B-v0.1
    results: []
language:
  - hi

Model Description

Mixtral Fine-Tuned for Hindi and Hinglish as part of ongoing experiments by bb deep learning systems

Model Sources [optional]

  • Paper: [More Information Coming Soon]

Training Details

Training Data

A mix of [Ultrachat200k] and [rohansolo/BB_HindiHinglishV2] were used for a total of 573,014,566 tokens in Hindi, Romanised Hindi and English.

Training Procedure

Training Loss at the end was

0.8977639613123988

Model was trained using the follwoing Hyperparameters:

warmup_steps: 100 weight_decay: 0.05

num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.0002

lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules:

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • w1
  • w2
  • w3 lora_target_linear: lora_fan_in_fan_out: lora_modules_to_save:
  • embed_tokens
  • lm_head

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Environmental Impact

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO$_2$eq/kWh. A cumulative of 94 hours of computation was performed on hardware of type A100 SXM4 80 GB (TDP of 400W).

Total emissions are estimated to be 16.24 kgCO$_2$eq of which 0 percents were directly offset.

- **Hardware Type:** [8 x A100 SXM4 80 GB]