The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion To Singing Style Conversion
Lester Phillip Violeta1 , Xueyao Zhang2 , Jiatong Shi3 , Yusuke Yasuda4 , Wen-Chin Huang1 , Zhizheng Wu2 , Tomoki Toda1
1 Nagoya University, Japan
2 The Chinese University of Hong Kong, Shenzhen, China
3 Carnegie Mellon University, USA
4 National Institute of Informatics, Japan
Audio Samples
This page presents converted singing samples from a set of baseline and proposed systems
across two singing style conversion tasks.
For more details, refer to the paper:
https://arxiv.org/abs/2509.15629
System abbreviations
B1, B2, B3 — Baseline systems 1–3
S1B–S7B — Proposed systems 1–7 (best configuration)
S1A, S3A, S4A, S6A — Ablation variants of the corresponding proposed systems
Style pairs
Each cell corresponds to a
source → target style conversion. Source styles:
Breathy, Control, Falsetto, Mixed . Target styles:
Glissando, Pharyngeal, Vibrato .
Task 1: In-Domain Singing Style Conversion (singerA)
Source style: Breathy (singerA, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Breathy)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Control (singerA, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Control)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Falsetto (singerA, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Falsetto)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Mixed (singerA, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Mixed)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Task 2: Zero-Shot Singing Style Conversion (singerB)
Note: Systems S3 and S4 did not submit for this task.
Source style: Breathy (singerB, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Breathy)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Control (singerB, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Control)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Falsetto (singerB, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Falsetto)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Mixed (singerB, utterance 0000)
System
→ Glissando
→ Pharyngeal
→ Vibrato
Source (Mixed)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B