legolasyiu commited on
Commit
b8d2592
1 Parent(s): 5ec2f2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -35,6 +35,74 @@ The model is intended for commercial and research use in multiple languages. The
35
 
36
  Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  # Uploaded model
39
 
40
  - **Developed by:** EpistemeAI
 
35
 
36
  Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
37
 
38
+ ## Usage
39
+
40
+ ### Requirements
41
+ Phi-3 family has been integrated in the `4.43.0` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
42
+
43
+ Examples of required packages:
44
+ ```
45
+ flash_attn==2.5.8
46
+ torch==2.3.1
47
+ accelerate==0.31.0
48
+ transformers==4.43.0
49
+ ```
50
+
51
+ Phi-3.5-mini-instruct is also available in [Azure AI Studio](https://aka.ms/try-phi3.5mini)
52
+
53
+ ### Tokenizer
54
+
55
+ Phi-3.5-mini-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
56
+
57
+ ### Input Formats
58
+ Given the nature of the training data, the Phi-3.5-mini-instruct model is best suited for prompts using the chat format as follows:
59
+
60
+ ```
61
+ <|system|>
62
+ You are a helpful assistant.<|end|>
63
+ <|user|>
64
+ How to explain Internet for a medieval knight?<|end|>
65
+ <|assistant|>
66
+ ```
67
+
68
+ ### Loading the model locally
69
+ After obtaining the Phi-3.5-mini-instruct model checkpoint, users can use this sample code for inference.
70
+
71
+ ```python
72
+ import torch
73
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
74
+ torch.random.manual_seed(0)
75
+ model = AutoModelForCausalLM.from_pretrained(
76
+ "microsoft/Phi-3.5-mini-instruct",
77
+ device_map="cuda",
78
+ torch_dtype="auto",
79
+ trust_remote_code=True,
80
+ )
81
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")
82
+ messages = [
83
+ {"role": "system", "content": "You are a helpful AI assistant."},
84
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
85
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
86
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
87
+ ]
88
+ pipe = pipeline(
89
+ "text-generation",
90
+ model=model,
91
+ tokenizer=tokenizer,
92
+ )
93
+ generation_args = {
94
+ "max_new_tokens": 500,
95
+ "return_full_text": False,
96
+ "temperature": 0.0,
97
+ "do_sample": False,
98
+ }
99
+ output = pipe(messages, **generation_args)
100
+ print(output[0]['generated_text'])
101
+ ```
102
+
103
+ Notes: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation="flash_attention_2"_
104
+
105
+
106
  # Uploaded model
107
 
108
  - **Developed by:** EpistemeAI