Embeddings model used?

#1
by DSNlau - opened

Hello, great job by the way!
May I ask which embeddings model you were using for the documents?
Thanks a lot!

Projecte Aina org
edited 1 day ago

We are using BAAI/bge-m3 which seems to work very well for our main languages in this project, which are Spanish and Catalan. We tried a few other multilingual embeddings but got the best results with BAAI/bge-m3. These embeddings support many languages and are trained specifically to match question-answer pairs. While BAAI don't give any evaluation results specifically for Catalan they do claim that their embeddings work well with low-resource languages, which would suggest that even if Catalan was not strongly represented in their training data one could expect decent results. Our own evaluations confirm this.

Sign up or log in to comment