File size: 1,598 Bytes
3e26fb0
 
 
 
20e3a5a
d73e563
 
d4ebbf5
d73e563
 
d4ebbf5
 
 
 
 
731a0f4
d73e563
d4ebbf5
3e26fb0
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language:
- en
---
[sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE) pre-trained on an instructional question-and-answer dataset. Evaluated on **Precision at K** metrics and **Mean reciprocal rank**.
Precision at K is a simple metric to understand and implement, but it has an important disadvantage - it does not take into account the order of elements in the "top". So, if we guessed only one item out of ten, it doesn't matter whether it was on the first or the last place - inline_formula in any case. It is obvious that the first variant is much better.
ean reciprocal rank equal to the reverse rank of the first correctly guessed item. Mean reciprocal rank varies in the range [0,1] and takes into account the position of items. Unfortunately, it does this only for one item - the 1st correctly predicted item, ignoring all subsequent items.

Evaluation results:
```python
p@1: 52 %
p@3: 66 %
p@5: 73 %
p@10: 79 %
p@15: 82 %
MRR: 62 %
``` 

```python
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("zjkarina/LaBSE-instructDialogs")
model = AutoModel.from_pretrained("zjkarina/LaBSE-instructDialogs")
sentences = ["List 5 reasons why someone should learn to code", "Describe the sound of the wind on a sunny day."]
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
with torch.no_grad():
    model_output = model(**encoded_input)
embeddings = model_output.pooler_output
embeddings = torch.nn.functional.normalize(embeddings)
print(embeddings)
```