The Singing Voice Conversion Challenge 2025:
From Singer Identity Conversion To Singing Style Conversion

Lester Phillip Violeta1, Xueyao Zhang2, Jiatong Shi3, Yusuke Yasuda4, Wen-Chin Huang1, Zhizheng Wu2, Tomoki Toda1

1Nagoya University, Japan    2The Chinese University of Hong Kong, Shenzhen, China
3Carnegie Mellon University, USA    4National Institute of Informatics, Japan

Audio Samples

This page presents converted singing samples from a set of baseline and proposed systems across two singing style conversion tasks.

For more details, refer to the paper: https://arxiv.org/abs/2509.15629


System abbreviations Style pairs Each cell corresponds to a source → target style conversion. Source styles: Breathy, Control, Falsetto, Mixed. Target styles: Glissando, Pharyngeal, Vibrato.
Task 1: In-Domain Singing Style Conversion (singerA)
Source style: Breathy (singerA, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Breathy)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Control (singerA, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Control)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Falsetto (singerA, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Falsetto)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Source style: Mixed (singerA, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Mixed)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S3B
S3A
S4B
S4A
S5B
S6B
S6A
S7B
Task 2: Zero-Shot Singing Style Conversion (singerB)
Note: Systems S3 and S4 did not submit for this task.
Source style: Breathy (singerB, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Breathy)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Control (singerB, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Control)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Falsetto (singerB, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Falsetto)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B
Source style: Mixed (singerB, utterance 0000)
System → Glissando → Pharyngeal → Vibrato
Source (Mixed)
Target (GT)
B1
B2
B3
S1B
S1A
S2B
S5B
S6B
S6A
S7B