File size: 2,592 Bytes
31df128
 
 
125b4b7
 
 
 
 
 
 
 
752e8e8
 
125b4b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
datasets: SberDevices/Golos
---
# **Acoustic and language models**

Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4)


Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm)

* LM built on [Common Crawl](https://commoncrawl.org) Russian dataset
* LM built on [Golos](https://huggingface.co/datasets/SberDevices/Golos) train set
* LM built on [Common Crawl](https://commoncrawl.org) and [Golos](https://huggingface.co/datasets/SberDevices/Golos) datasets together (50/50)

| Archives                 | Size       |  Links          |
|--------------------------|------------|-----------------|
| QuartzNet15x5_golos.nemo | 68 MB      | https://sc.link/ZMv |
| KenLMs.tar               | 4.8 GB     | https://sc.link/YL0  |


Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.


## **Evaluation**

Percents of Word Error Rate for different test sets


| Decoder \ Test set    | Crowd test  | Farfield test    | MCV<sup>1</sup> dev | MCV<sup>1</sup> test |
|-------------------------------------|-----------|----------|-----------|----------|
| Greedy decoder                      | 4.389 %   | 14.949 % | 9.314 %   | 11.278 % |
| Beam Search with Common Crawl LM    | 4.709 %   | 12.503 % | 6.341 %   | 7.976 % |
| Beam Search with Golos train set LM | 3.548 %   | 12.384 % |  -        | -       |
| Beam Search with Common Crawl and Golos LM | 3.318 %   | 11.488 % | 6.4 %     | 8.06 %   |


<sup>1</sup> [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak.

##  **Resources**

[[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161)

[[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/)

[[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)