WillHeld
/

DiVA-llama-3-v0-8b

Feature Extraction

Model card Files Files and versions Community

WillHeld commited on 4 days ago

Commit

3a66cfa

•

1 Parent(s): 49f38f9

Update README.md

Files changed (1) hide show

README.md +33 -0

README.md CHANGED Viewed

@@ -11,6 +11,39 @@ base_model:
 This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
 See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
 ## Citation
 **BibTeX:**

 This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
 See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
+### Inference Example
+```
+from transformers import AutoModel
+import librosa
+import wget
+from modeling_diva import DiVAModel
+filename = wget.download(
+    "https://github.com/ffaisal93/SD-QA/raw/refs/heads/master/dev/eng/irl/wav_eng/-1008642825401516622.wav"
+)
+speech_data, _ = librosa.load(filename, sr=16_000)
+model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3-v0-8b", trust_remote_code=True)
+print(model.generate([speech_data]))
+print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
+filename = wget.download(
+    "https://github.com/ffaisal93/SD-QA/raw/refs/heads/master/dev/eng/irl/wav_eng/-2426554427049983479.wav"
+)
+speech_data2, _ = librosa.load(filename, sr=16_000)
+print(
+    model.generate(
+        [speech_data, speech_data2],
+        ["Reply Briefly Like A Pirate", "Reply Briefly Like A New Yorker"],
+    )
+)
+```
 ## Citation
 **BibTeX:**