File size: 2,613 Bytes
e88c193
ca13cba
 
 
e88c193
71adb80
4c1e43e
 
455b203
 
 
5127871
d812072
 
6992736
5127871
 
2b6a9d8
 
77da23e
2b6a9d8
 
 
 
5127871
6d195f6
5127871
 
 
 
 
768a9dc
5127871
 
 
 
1f5f0c4
5127871
 
1f5f0c4
5127871
 
 
 
 
1f5f0c4
264e1ba
1f5f0c4
455b203
 
 
 
40295ee
455b203
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
language:
- ru
- ru-RU
tags:
- mbart
inference:
  parameters:
    no_repeat_ngram_size: 4,
    top_k : 0,
    num_beams : 5,
datasets:
- IlyaGusev/gazeta
- samsum
- samsum (translated to RU)
widget:
- text: | 
    Джефф: Могу ли я обучить модель 🤗 Transformers на Amazon SageMaker? 
    Филипп: Конечно, вы можете использовать новый контейнер для глубокого обучения HuggingFace. 
    Джефф: Хорошо.
    Джефф: и как я могу начать? 
    Джефф: где я могу найти документацию? 
    Филипп: ок, ок, здесь можно найти все: https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face

model-index:
- name: "mbart_ruDialogSum"
  results:
  - task: 
      name: Abstractive Dialogue Summarization
      type: abstractive-text-summarization 
    dataset:
      name: "SAMSum Corpus (translated to Russian)" 
      type: samsum
    metrics:
       - name: Validation ROGUE-1
         type: rogue-1
         value: 34.5
       - name: Validation ROGUE-L
         type: rogue-l
         value: 33
       - name: Test ROGUE-1
         type: rogue-1
         value: 31
       - name: Test ROGUE-L
         type: rogue-l
         value: 28
---
### 📝 Description

MBart for Russian summarization fine-tuned for **dialogues** summarization.


This model was firstly fine-tuned by [Ilya Gusev](https://hf.co/IlyaGusev) on [Gazeta dataset](https://huggingface.co/datasets/IlyaGusev/gazeta). We have **fine tuned** that model on [SamSum dataset]() **translated to Russian** using GoogleTranslateAPI

🤗 Moreover! We have implemented a **! telegram bot [@summarization_bot](https://t.me/summarization_bot) !** with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗


### ❓ How to use with code
```python
from transformers import MBartTokenizer, MBartForConditionalGeneration

# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"   
tokenizer =  AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    top_k=0,
    num_beams=3,
    no_repeat_ngram_size=3
)[0]


summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
```