Masioki
/

fusion_gttbsc_distilbert-uncased-ft

fusion-cross-attention-sentence-classifier

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Masioki commited on Jun 17

Commit

c1a4ef6

•

1 Parent(s): 14a0cd1

Update README.md

Files changed (1) hide show

README.md +29 -11

README.md CHANGED Viewed

@@ -3,7 +3,25 @@ tags:
 - generated_from_trainer
 model-index:
 - name: fusion_gttbsc_distilbert-uncased-ft
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,21 +29,23 @@ should probably proofread and complete it, then remove this comment. -->
 # fusion_gttbsc_distilbert-uncased-ft
-This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -41,8 +61,6 @@ The following hyperparameters were used during training:
 - num_epochs: 20
 - mixed_precision_training: Native AMP
-### Training results
 ### Framework versions

 - generated_from_trainer
 model-index:
 - name: fusion_gttbsc_distilbert-uncased-ft
+  results:
+    - task:
+        type: dialogue act classification
+      dataset:
+        name: asapp/slue-phase-2
+        type: hvb
+      metrics:
+        - name: F1 macro E2E
+          type: F1 macro
+          value: TBA
+        - name: F1 macro GT
+          type: F1 macro
+          value: TBA
+datasets:
+- asapp/slue-phase-2
+language:
+- en
+metrics:
+- f1-macro
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # fusion_gttbsc_distilbert-uncased-ft
+Ground truth text with prosody encoding and ASR encoding residual cross attention fusion multi-label DAC
+## Model description
+ASR encoder: [Whisper small](https://huggingface.co/openai/whisper-small) encoder
+Prosody encoder: 2 layer transformer encoder with initial dense projection
+Backbone: [DistilBert uncased](https://huggingface.co/distilbert/distilbert-base-uncased)
+Fusion: 2 residual cross attention fusion layers (F_asr x F_text and F_prosody x F_text) with dense layer on top
+Pooling: Self attention
+Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween
 ## Training and evaluation data
+Trained on ground truth.
+Evaluated on ground truth (GT) and normalized [Whisper small](https://huggingface.co/openai/whisper-small) transcripts (E2E).
 ### Training hyperparameters
 - num_epochs: 20
 - mixed_precision_training: Native AMP
 ### Framework versions