README.md · rohansolo/BB-Mixtral-HindiHinglish-8x7B-v0.1 at main

metadata

library_name: peft
license: cc-by-nc-4.0
base_model: mistralai/Mixtral-8x7B-v0.1
datasets:
  - HuggingFaceH4/ultrachat_200k
  - rohansolo/BB_HindiHinglishV2
model-index:
  - name: BB-Mixtral-HindiHinglish-8x7B-v0.1
    results: []
language:
  - hi

Model Description

Mixtral Fine-Tuned for Hindi and Hinglish as part of ongoing experiments by bb deep learning systems

Developed by: bb deep learning systems
Language(s) (NLP): [English, Hindi, Romanised Hindi]
License: [cc-by-nc-4.0]
Finetuned from model: mistralai/Mixtral-8x7B-v0.1

Model Sources [optional]

Paper: [More Information Coming Soon]

Training Details

Training Data

A mix of [Ultrachat200k] and [rohansolo/BB_HindiHinglishV2] were used for a total of 573,014,566 tokens in Hindi, Romanised Hindi and English.

Training Procedure

Training Loss at the end was

0.8977639613123988

Model was trained using the follwoing Hyperparameters:

warmup_steps: 100 weight_decay: 0.05

num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.0002

lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules:

q_proj
k_proj
v_proj
o_proj
w1
w2
w3 lora_target_linear: lora_fan_in_fan_out: lora_modules_to_save:
embed_tokens
lm_head

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Environmental Impact

Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO$_2$eq/kWh. A cumulative of 94 hours of computation was performed on hardware of type A100 SXM4 80 GB (TDP of 400W).

Total emissions are estimated to be 16.24 kgCO$_2$eq of which 0 percents were directly offset.

- **Hardware Type:** [8 x A100 SXM4 80 GB]