--- license: apache-2.0 tags: - generated_from_trainer - instruction fine-tuning model-index: - name: flan-t5-small-distil-v2 results: [] language: - en pipeline_tag: text2text-generation --- # LaMini-FLAN-T5-Small This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](). ## Model description We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 128 - eval_batch_size: 64 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 512 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 5 ## Training and evaluation data We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper](). ## Model Models You can download LaMini model series as follow. Note that not all models are performing as well. More details can be seen in our [paper]().
Click to expand
LaMini Language Models collection.
Name Architecture Initialization
LaMini-T5-61M encoder-decoder T5-small
LaMini-T5-223M encoder-decoder T5-base
LaMini-T5-738M encoder-decoder T5-large
LaMini-Flan-T5-77M encoder-decoder Flan-T5-small
LaMini-Flan-T5-248M encoder-decoder Flan-T5-base
LaMini-Flan-T5-783M encoder-decoder Flan-T5-large
LaMini-Cb-111M decoder-only Cerebras-GPT-111M
LaMini-Cb-256M decoder-only Cerebras-GPT-256M
LaMini-Cb-590M decoder-only Cerebras-GPT-590M
LaMini-Cb-1.3B decoder-only Cerebras-GPT-1.3B
LaMini-GPT-124M decoder-only GPT-2
LaMini-GPT-774M decoder-only GPT-2 large
LaMini-GPT-1.5B decoder-only GPT-2 xl
## Use ### CPU
Click to expand ```python # pip install -q transformers from transformers import pipeline checkpoint = "{model_name}" model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True) input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"' generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text'] print("Response": generated_text) ```
### GPU
Click to expand ```python # pip install -q transformers from transformers import pipeline checkpoint = "{model_name}" model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0) input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"' generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text'] print("Response": generated_text) ```
## Intended uses & limitations More information needed ### Framework versions - Transformers 4.27.0 - Pytorch 2.0.0+cu117 - Datasets 2.2.0 - Tokenizers 0.13.2