Transformers
mpt
Composer
MosaicML
llm-foundry
text-generation-inference

How to split a model

#6
by nib12345 - opened

Hi guys,
Does anyone have idea?

How to split
mpt-30b-chat.ggmlv0.q4_1.bin

to

mpt-30b-chat.ggmlv0.q4_1_00001_of_00004.bin
mpt-30b-chat.ggmlv0.q4_1_00002_of_00004.bin
mpt-30b-chat.ggmlv0.q4_1_00003_of_00004.bin
mpt-30b-chat.ggmlv0.q4_1_00004_of_00004.bin

As to load on kaggle (as kaggle has limitation of ram).

If anyone has idea, please say?
I dont know much about how to develop model, i am just a Full Stack Developer.
Thanks.

That's not possible. GGML does not support multi-part GGML files.

Using KoboldCpp you can offload some of the model to GPU (if you have one), which will reduce RAM usage accordingly.

But there's no GPU support for MPT GGML models from Python code at this time. Only using the KoboldCpp UI.

nib12345 changed discussion status to closed

Sign up or log in to comment