Finetuning Genna for Foreign Language

#78
by deleted - opened
deleted

I am attempting to fine-tune Gemma for one of the languages on which it has been pretrained. Could you provide any suggestions regarding the optimal size of the dataset to ensure a noticeable improvement in performance? The best format for the training files? Any other recommendations? Thank you.

This comment has been hidden

@user1357925

hello friend. I had a good response from the gemma 2b model using this format to pass to the dataset. I did the Fine tuning for Brazilian Portuguese. He follows
I have 2 datasets. One for mental 36k (gemma 2b )
another with 100k for instruct ( gemma 7b )

def formatting_func(example):
     instruction = example['question']
     output = example['answer']
     text = f"<start_of_turn>user\n{instruction}<end_of_turn> <start_of_turn>model\n{output}<end_of_turn>"
     return text
deleted

Thank you. Would you mind sharing the whole script you used for fine-tuning?

@user1357925
Yeah, sure. Could you send me a email?
rhaymisoncristian@gmail.com or call me on linkedIn and i will share the notebook with you.

@rhaymison Is it possible to share the script you used?

@Wielebnyd sure, could you send me a mail to share with you the full notebook ?

@rhaymison -Thank you, I just sent you an email.

@rhaymison hello sir I'm intersting in your work can you give some informations about the prompt you used and lora rank you used please

I want to fine tune gemma 2b on 40k row english and darija(moroccan language)

@SaadManzur
I received your email, i will give your more info. Thanks

Sign up or log in to comment