---
datasets:
- mozilla-foundation/common_voice_11_0
language:
- el
metrics:
- wer
license: apache-2.0
---

# Whisper small adapters model for Greek transcription
We added adapters to whisper-small model then we finetuned it on Greek ASR. During training, the model is frozen and only the adapters are being trained. When trying to 
transcribe Greek, we need to activate the adapters, otherwise we can ignore the adapters and use the original whisper model.
## How to use

Start by installing transformers with Whisper model with added adapters
```bash
git clone https://gitlab.com/horizon-europe-voxreality/multilingual-translation/speech-translation-demo.git
cd speech-translation-demo
# You might need to switch to dev branch
pip install -e transformers
```
The parameter `use_adapters` is used to decide whether we will use the adapters or not. It needs to be set to True only in the case of Greek.

```python
from transformers import WhisperProcessor, WhisperForConditionalGenerationWithAdapters
from datasets import Audio, load_dataset

# load model and processor
processor = WhisperProcessor.from_pretrained("voxreality/whisper-small-el-adapters")
model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="greek", task="transcribe")

# load streaming dataset and read first audio sample
ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
input_features = processor(input_speech["array"], sampling_rate=input_speech["sampling_rate"], return_tensors="pt").input_features

# Set use_adapters to False for languages other than Greek.
# generate token ids
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids, use_adapters=True)

# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
```

You can also use an HF pipeline:
```python
from transformers import pipeline
from datasets import Audio, load_dataset

ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]

model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")

pipe = pipeline("automatic-speech-recognition", model=model, tokenizer="voxreality/whisper-small-el-adapters",
                  "voxreality/whisper-small-el-adapters", device='cpu', batch_size=32)

transcription = pipe(input_speech['array'], generate_kwargs = {"language":f"<|el|>","task": "transcribe", "use_adapters": True})
```