Automatic Speech Recognition
Transformers
Safetensors
Japanese
whisper
audio
hf-asr-leaderboard
Inference Endpoints
asahi417 commited on
Commit
1c79663
1 Parent(s): 3151dd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -53
README.md CHANGED
@@ -6,48 +6,17 @@ tags:
6
  - audio
7
  - automatic-speech-recognition
8
  - hf-asr-leaderboard
9
- metrics:
10
- - wer
11
- - cer
12
- model-index:
13
- - name: kotoba-tech/kotoba-whisper-v2.0
14
- results:
15
- - task:
16
- type: automatic-speech-recognition
17
- dataset:
18
- name: CommonVoice_8.0 (Japanese)
19
- type: japanese-asr/ja_asr.common_voice_8_0
20
- metrics:
21
- - name: WER
22
- type: WER
23
- value: 58.9
24
- - name: CER
25
- type: CER
26
- value: 9.2
27
- - task:
28
- type: automatic-speech-recognition
29
- dataset:
30
- name: ReazonSpeech (Test)
31
- type: japanese-asr/ja_asr.reazonspeech_test
32
- metrics:
33
- - name: WER
34
- type: WER
35
- value: 55.6
36
- - name: CER
37
- type: CER
38
- value: 11.63
39
- - task:
40
- type: automatic-speech-recognition
41
- dataset:
42
- name: JSUT Basic5000
43
- type: japanese-asr/ja_asr.jsut_basic5000
44
- metrics:
45
- - name: WER
46
- type: WER
47
- value: 63.8
48
- - name: CER
49
- type: CER
50
- value: 8.4
51
  datasets:
52
  - japanese-asr/whisper_transcriptions.reazonspeech.all
53
  - japanese-asr/whisper_transcriptions.reazonspeech.all.wer_10.0
@@ -66,17 +35,26 @@ Following table presents the raw CER (unlike usual CER where the punctuations ar
66
  along with the.
67
 
68
 
69
- | model | CommonVoice 8.0 (Japanese) | JSUT Basic 5000 | ReazonSpeech Test |
70
- |:---------------------------------------------------------|---------------------------------------:|-------------------------------------:|----------------------------------------:|
71
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator + stable-ts) | 13.7 | 11.4 | 17 |
72
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator) | 13.8 | 11.6 | 17.3 |
73
- | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (stable-ts) | 15.5 | 15.4 | 17 |
74
- | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 15.4 | 15.4 | 17.4 |
75
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator + stable-ts) | 13.7 | 11.2 | 17.4 |
76
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator) | 13.9 | 11.4 | 18 |
77
- | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (stable-ts) | 15.7 | 15 | 17.7 |
78
- | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 15.6 | 15.2 | 17.8 |
79
- | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 12.9 | 13.4 | 20.6 |
 
 
 
 
 
 
 
 
 
80
 
81
  Regarding to the normalized CER, since those update from v2.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v2.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0).
82
 
 
6
  - audio
7
  - automatic-speech-recognition
8
  - hf-asr-leaderboard
9
+ widget:
10
+ - example_title: CommonVoice 8.0 (Test Split)
11
+ src: >-
12
+ https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0/resolve/main/sample.flac
13
+ - example_title: JSUT Basic 5000
14
+ src: >-
15
+ https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000/resolve/main/sample.flac
16
+ - example_title: ReazonSpeech (Test Split)
17
+ src: >-
18
+ https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test/resolve/main/sample.flac
19
+ pipeline_tag: automatic-speech-recognition
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  datasets:
21
  - japanese-asr/whisper_transcriptions.reazonspeech.all
22
  - japanese-asr/whisper_transcriptions.reazonspeech.all.wer_10.0
 
35
  along with the.
36
 
37
 
38
+ | model | [CommonVoice 8 (Japanese test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.common_voice_8_0) | [JSUT Basic 5000](https://huggingface.co/datasets/japanese-asr/ja_asr.jsut_basic5000) | [ReazonSpeech (held out test set)](https://huggingface.co/datasets/japanese-asr/ja_asr.reazonspeech_test) |
39
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------:|----------------------------------------------------------------------------------------:|------------------------------------------------------------------------------------------------------------:|
40
+ | [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0) | 17.6 | 15.4 | 17.4 |
41
+ | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) | 17.7 | 15.4 | 17 |
42
+ | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator + stable-ts) | 17.7 | 15.4 | 17 |
43
+ | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (punctuator) | 17.7 | 15.4 | 17 |
44
+ | [kotoba-tech/kotoba-whisper-v2.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.1) (stable-ts) | 17.7 | 15.4 | 17 |
45
+ | [kotoba-tech/kotoba-whisper-v1.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.0) | 17.8 | 15.2 | 17.8 |
46
+ | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) | 17.9 | 15 | 17.8 |
47
+ | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator + stable-ts) | 17.9 | 15 | 17.8 |
48
+ | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (punctuator) | 17.9 | 15 | 17.8 |
49
+ | [kotoba-tech/kotoba-whisper-v1.1](https://huggingface.co/kotoba-tech/kotoba-whisper-v1.1) (stable-ts) | 17.9 | 15 | 17.8 |
50
+ | [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) | 15.3 | 13.4 | 20.5 |
51
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 15.9 | 10.6 | 34.6 |
52
+ | [openai/whisper-large](https://huggingface.co/openai/whisper-large) | 16.6 | 11.3 | 40.7 |
53
+ | [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) | 17.9 | 13.1 | 39.3 |
54
+ | [openai/whisper-base](https://huggingface.co/openai/whisper-base) | 34.5 | 26.4 | 76 |
55
+ | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | 21.5 | 18.9 | 48.1 |
56
+ | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) | 58.8 | 38.3 | 153.3 |
57
+
58
 
59
  Regarding to the normalized CER, since those update from v2.1 will be removed by the normalization, kotoba-tech/kotoba-whisper-v2.1 marks the same CER values as [kotoba-tech/kotoba-whisper-v2.0](https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0).
60