microsoft/Phi-3.5-MoE-instruct · How much GPU is needed to load the Phi-3.5-MoE-instruct model

17 days ago

Hi all, I'm trying to load the model into a g5.48xlarge instance I found in AWS but I'm getting out of GPU memory error. It has 192 GB GPU memory overall (8 A10G GPU, each has 24 GB space). I thought this should be sufficient, for loading this, no? The command I use is the one shared in the main page:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline 

torch.random.manual_seed(0) 

model = AutoModelForCausalLM.from_pretrained( 
    "microsoft/Phi-3.5-MoE-instruct",  
    device_map="cuda",  
    torch_dtype="auto",  
    trust_remote_code=True,  
)

Is there a way to make this fit into the g5.48xlarge? Thanks in advance.

LiyuanLucasLiu

Microsoft org 17 days ago

you can try changing device_map="cuda" to device_map="auto"

cyt78

16 days ago

Thanks, @LiyuanLucasLiu . I could manage to fit in the instance after making the proposed change. One more follow up question. Is it possible to run 8bit inference with this model too and if so what is the impact of quantization on the performance?