Patch Sentence Transformers integration

#2
by tomaarsen HF staff - opened

Hello!

Congratulations on your release! Well done πŸ‘

Pull Request overview

  • Patch Sentence Transformers integration, in particular:
    • Rename "1_Pool" to "1_Pooling": the latter is referenced in modules.json and will be used to load the pooling configuration.
    • Update the pooling configuration to also include the prompt in the pooling. This previously resulted in a slight difference between transformers and sentence-transformers.
  • Simplified the code snippet:
    • max_seq_length is now defined in sentence_bert_config.json.
    • a Normalize module is added in modules.json, which means that all outputs will be normalized even without specifying normalize_embeddings=True.
  • Add instructions to the prompts dictionary in config_sentence_transformers.json. This allows for model.encode(my_texts, prompt_name="nq")
  • Add a sentence-transformers tag, making the model easier to find when searching for embedding models under https://huggingface.co/models?library=sentence-transformers&sort=trending

Details

I ran the updated script in the README, and it gave me [[0.35365450382232666, 0.18592746555805206]], which is the same as what I get when running the transformers snippet.

  • Tom Aarsen
tomaarsen changed pull request status to open
Kaguya-19 changed pull request status to merged
OpenBMB org

Thank you!

OpenBMB org

Thank you for your helpful work!

Sign up or log in to comment