How to reduce GPU memory?

#2
by ulrika-cyl - opened

image.png

during my infer test, the gpu memory usage up to 70G

me too, why the memory usage of int4 model is up to 70G?

OpenGVLab org

LMdeploy will pre-allocate kv cache. You can consider setting the cache-max-entry-count parameter to reduce the maximum GPU memory usage. See https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html#memory-usage-testing

czczup changed discussion status to closed

Sign up or log in to comment