How much GPU is needed to load the Phi-3.5-MoE-instruct model

#44
by cyt78 - opened

Hi all, I'm trying to load the model into a g5.48xlarge instance I found in AWS but I'm getting out of GPU memory error. It has 192 GB GPU memory overall (8 A10G GPU, each has 24 GB space). I thought this should be sufficient, for loading this, no? The command I use is the one shared in the main page:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline 

torch.random.manual_seed(0) 

model = AutoModelForCausalLM.from_pretrained( 
    "microsoft/Phi-3.5-MoE-instruct",  
    device_map="cuda",  
    torch_dtype="auto",  
    trust_remote_code=True,  
) 

Is there a way to make this fit into the g5.48xlarge? Thanks in advance.

Microsoft org

you can try changing device_map="cuda" to device_map="auto"

Thanks, @LiyuanLucasLiu . I could manage to fit in the instance after making the proposed change. One more follow up question. Is it possible to run 8bit inference with this model too and if so what is the impact of quantization on the performance?

Sign up or log in to comment