TheBloke/Llama-2-70B-Chat-GPTQ · Which AWS Instance is good to run Llama-2-70B

iamajithkumar

Aug 10, 2023

Which AWS Instance is good to run Llama-2-70B

TheBloke

Owner Aug 10, 2023

I'm not familiar with the choice of AWS instances. But in general to run a 70B GPTQ you need:

a 48GB or 80GB GPU
- Or 2 x 24GB, but 1 x 48GB or bigger is better
64+ GB RAM

iongpt

Aug 13, 2023

Which AWS Instance is good to run Llama-2-70B

None.

I am a big AWS user (10k$+ per month) and they are lacking on good GPU options.

You have the g5.2xlarge (8vCPU, 32Gb RAM) with 1xA10 (24Gb VRAM). With the savings, you can get it at 0.49$/h

But from there, things gets complicated. If you want more than 1xA10, lowest option is g5.12xlarge that comes packed also with 48 vCPUs and 192Gb of RAM and 4xA10, but it costs more than 5$/h. Next option is 8xA10 in g5.48xlarge for about 15$/h. You also get 192vCPU and 768Gb RAM.

There is no option for A6000

The only option with A100 is p4.24xlarge at more than 20$/h for 8xA100

The only option with H100 is p5.48xlarge at almost 100$/h

I am in contact with my AWS account manager explaining that I am currently buying all the GPU time for my needs from other clouds and it is annoying to move that around and I might decide to jump ship entirely if they don't fix this. Not sure if this will work, but for now AWS is unusable for anything requiring more than 24Gb of VRAM

iamajithkumar changed discussion status to closed Aug 14, 2023

vvvbb

Sep 4, 2023

@iongpt

What other services would you recommend?

adussarps

Sep 4, 2023

And what technologies (library/container/EFS) are you using to run LLAMA 2 on AWS?

Also do you have an opinion on inf2 instances? I heard llama2 can be run on thoses too.

Ernie

Nov 5, 2023

I just tried cloned https://huggingface.co/meta-llama/Llama-2-70b-hf and tried it on AWS g5.8xlarge instance. This instance has 128G CPU RAM and seems like not enough for loading this 70 billion model.