add "mm_spatial_pool_mode" to config.

#3
by litianjian - opened

"mm_spatial_pool_mode" is still a configuration, similar to the llava_next_video.

Llava Hugging Face org

@litianjian yes, but all LLaVA-OV models use only bilinear interpolation and the HF code doesn't support any other pooling modes therefore

@litianjian yes, but all LLaVA-OV models use only bilinear interpolation and the HF code doesn't support any other pooling modes therefore

Thank you for your reply. This flags will affect the execution of the code in llava-ov git repo, and this flag also takes effect in my project.
Of course, all released llava-ov models use only bilinear interpolation and HF code adopts the default value. I can submit an issue in HF code after this.

Llava Hugging Face org

@litianjian can you elaborate pls on how this affect llava-ov repo, as the llava-repo is not expected to support HF-style checkpoints? If you're training/tuning a model in llava repo and want to deploy with HF, you can convert the weights to HF style with our conversion script. It is available in transformers repo under model/llava_onevision/convert_weights.py

@litianjian can you elaborate pls on how this affect llava-ov repo, as the llava-repo is not expected to support HF-style checkpoints? If you're training/tuning a model in llava repo and want to deploy with HF, you can convert the weights to HF style with our conversion script. It is available in transformers repo under model/llava_onevision/convert_weights.py

Thank you for your reply. We are training the llava-ov models by the llava-ov repo. Meanwhile, I focus on the model development. vLLM which depends on the hF, is our first choice. The entire workflow is "llava-ov -> HF -> vLLM"

Llava Hugging Face org

I see now. In that case I recommend to convert your weight to HF format by using this script (https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py) which also uploads them on the hub after conversion. Since vllm works with hf models I don't think it expects the config key 'mm_spatial_pool_mode'.

Also I am not sure llava-ov architecture is supported, just found this issue in the repo (https://github.com/vllm-project/vllm/issues/7420). LMK if the suggestions help :)

I see now. In that case I recommend to convert your weight to HF format by using this script (https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py) which also uploads them on the hub after conversion. Since vllm works with hf models I don't think it expects the config key 'mm_spatial_pool_mode'.

Also I am not sure llava-ov architecture is supported, just found this issue in the repo (https://github.com/vllm-project/vllm/issues/7420). LMK if the suggestions help :)

The PRs(https://github.com/vllm-project/vllm/pull/8486) to support llava-ov architecture in vLLM will be merged soon.

Llava Hugging Face org

@litianjian Super cool! From the PR seems like the pooling_mode is not needed anymore to be in config file?

@litianjian Super cool! From the PR seems like the pooling_mode is not needed anymore to be in config file?

vLLM's implementation is based on the LlavaOnevisionConfig and LlavaOnevisionForConditionalGeneration in HF. "pooling_model” is not supported in HF so that I can not figure it out in vllm's implementation.

Llava Hugging Face org

oke, cool. Then you can use same code as in HF I guess for packing image features. Feel free to close the PR and thanks a lot for working in vLLM support :)

litianjian changed pull request status to closed

Sign up or log in to comment