jinaai
/

jina-clip-v1

Feature Extraction

Transformers.js

sentence-similarity

🇪🇺 Region: EU

Model card Files Files and versions Community

bwang0911 commited on May 30

Commit

0aaf6db

•

1 Parent(s): ea6cbd8

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -66,6 +66,34 @@ print(cos_sim(text_embeddings[0], text_embeddings[1])) # text embedding similari
 print(cos_sim(text_embeddings[0], image_embeddings[0])) # text-image cross-modal similarity
 ```
 ## Performance
 ### Text-Image Retrieval

 print(cos_sim(text_embeddings[0], image_embeddings[0])) # text-image cross-modal similarity
 ```
+**notice: our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity!**
+If you want to merge two scores, we recommended 2 ways:
+1. weighted average of text-text sim and text-image sim:
+```python
+# pseudo code
+alpha = 0.6 # text search
+beta = 0.4 # cross-modal search
+combined_scores = alpha * sim(query, document) + beta * sim(text, image)
+```
+2. apply z-score normalization before merging scores:
+```python
+# pseudo code
+query_document_sim_mean = np.mean(cos_sim_query_documents)
+query_document_sim_std = np.std(cos_sim_query_documents)
+text_image_sim_mean = np.mean(cos_sim_text_images)
+text_image_sim_std = np.std(cos_sim_text_images)
+query_document_sim_normalized = (cos_sim_query_documents - query_document_sim_mean) / query_document_sim_std
+text_image_sim_normalized = (cos_sim_text_images - text_image_sim_mean) / text_image_sim_std
+# sum normalized scores
+```
 ## Performance
 ### Text-Image Retrieval