Merge branch 'main' of https://huggingface.co/slseanwu/compose-and-embellish-pop1k7 into main
Browse files
README.md
CHANGED
@@ -5,9 +5,10 @@ tags:
|
|
5 |
- pytorch
|
6 |
- audio
|
7 |
- music
|
|
|
8 |
license: mit
|
9 |
---
|
10 |
-
# Compose & Embellish
|
11 |
Trained model weights and training datasets for the paper:
|
12 |
* Shih-Lun Wu and Yi-Hsuan Yang
|
13 |
"[Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach](https://arxiv.org/abs/2209.08212)."
|
@@ -17,8 +18,23 @@ Trained model weights and training datasets for the paper:
|
|
17 |
### Stage 1: "Compose" model
|
18 |
Generates **melody and chord progression** from scratch.
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
### Stage 2: "Embellish" model
|
21 |
Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## BibTex
|
24 |
If you find the materials useful, please consider citing our work:
|
|
|
5 |
- pytorch
|
6 |
- audio
|
7 |
- music
|
8 |
+
- piano
|
9 |
license: mit
|
10 |
---
|
11 |
+
# Compose & Embellish: Piano Performance Generation Pipeline
|
12 |
Trained model weights and training datasets for the paper:
|
13 |
* Shih-Lun Wu and Yi-Hsuan Yang
|
14 |
"[Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach](https://arxiv.org/abs/2209.08212)."
|
|
|
18 |
### Stage 1: "Compose" model
|
19 |
Generates **melody and chord progression** from scratch.
|
20 |
|
21 |
+
- Model backbone: 12-layer Transformer w/ relative positional encoding
|
22 |
+
- Num trainable params: 41.3M
|
23 |
+
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
|
24 |
+
- Pretraining dataset: subset of [Lakh MIDI full](https://colinraffel.com/projects/lmd/) (**LMD-full**), 14934 songs
|
25 |
+
- melody extraction (and data filtering) done by **matching lyrics to tracks**: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
|
26 |
+
- structural segmentation done with **A\* search**: https://github.com/Dsqvival/hierarchical-structure-analysis
|
27 |
+
- Finetuning dataset: subset of [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1591 songs
|
28 |
+
- melody extraction done with **skyline algorithm**: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
|
29 |
+
- structural segmentation done in the same way as pretraining dataset
|
30 |
+
- Training sequence length: 2400
|
31 |
### Stage 2: "Embellish" model
|
32 |
Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
|
33 |
+
- Model backbone: 12-layer **Performer** ([paper](https://arxiv.org/abs/2009.14794), [implementation](https://github.com/idiap/fast-transformers))
|
34 |
+
- Num trainable params: 38.2M
|
35 |
+
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
|
36 |
+
- Training dataset: [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1747 songs
|
37 |
+
- Training sequence length: 3072
|
38 |
|
39 |
## BibTex
|
40 |
If you find the materials useful, please consider citing our work:
|