Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

Official Audio Samples

Authors: Lester Phillip Violeta¹, Wen-Chin Huang¹, Ding Ma¹, Ryuichi Yamamoto¹, Kazuhiro Kobayashi^1,2, Tomoki Toda¹
1: Nagoya University, Japan
2: TARVO Inc., Japan

Transcriptions:
Utterance 1: 上司のおごりで、しゃぶしゃぶの食べ放題に行った。/ jyoushi no ogori de, shabushabu no tabehoudai ni itta.
Utterance 2: 新幹線で23時に名古屋に着きます。/ shinkansen de ni jyuu san ji ni nagoya ni tsukimasu.
Utterance 3: 映画館でポップコーンを食べる。/ eigakan de poppukoon wo taberu.
Utterance 4: おととい届いたウォーターサーバーが故障した。/ ototoi todoita woota saaba ga koshou shita.

System and Description	Utterance 1	Utterance 2	Utterance 3	Utterance 4
Source EL
Target GT
(1) Baseline, mel/mel, TTS/AE
(2) Baseline, mel/mel, Parallel VC
(3) Proposed, PPG/HuBERT, TTS/AE
(4) Proposed, PPG/HuBERT, Parallel VC
(5) Ablation, PPG/mel, Parallel VC