zpn commited on
Commit
5f25823
1 Parent(s): 856ca13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -2661,12 +2661,15 @@ Training data to train the models is released in its entirety. For more details,
2661
 
2662
  ## Usage
2663
 
 
 
 
2664
  ### Sentence Transformers
2665
  ```python
2666
  from sentence_transformers import SentenceTransformer
2667
 
2668
  model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
2669
- sentences = ['What is TSNE?', 'Who is Laurens van der Maaten?']
2670
  embeddings = model.encode(sentences)
2671
  print(embeddings)
2672
  ```
@@ -2683,7 +2686,7 @@ def mean_pooling(model_output, attention_mask):
2683
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2684
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2685
 
2686
- sentences = ['What is TSNE?', 'Who is Laurens van der Maaten?']
2687
 
2688
  tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
2689
  model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)
 
2661
 
2662
  ## Usage
2663
 
2664
+ Note `nomic-embed-text` requires prefixes! We support the prefixes `[search_query, search_document, classification, clustering]`.
2665
+ For retrieval applications, you should prepend `search_document` for all your documents and `search_query` for your queries.
2666
+
2667
  ### Sentence Transformers
2668
  ```python
2669
  from sentence_transformers import SentenceTransformer
2670
 
2671
  model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
2672
+ sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
2673
  embeddings = model.encode(sentences)
2674
  print(embeddings)
2675
  ```
 
2686
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2687
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2688
 
2689
+ sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
2690
 
2691
  tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
2692
  model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)