| Method | Chinese | English |
|---|---|---|
| FastSpeech2 | ||
| StyleSpeech | N/A | |
| VITS | ||
| IPA-TTS | ||
| LanStyleTTS Base | ||
| LanStyleTTS VITS | ||
| Ground Truth | ||
| Method | Chinese | English |
|---|---|---|
| Style Adaptation | ||
| LanStyleTTS Base (w/o s) | ||
| LanStyleTTS Base (w s) | ||
| LanStyleTTS VITS (w/o s) | ||
| LanStyleTTS VITS (w s) | ||
| Style Adaptation | ||
| LanStyleTTS Base (Alpha) | ||
| LanStyleTTS Base (IPA) | ||
| LanStyleTTS Base (IPA + Style) | ||
| LanStyleTTS VITS (Alpha) | ||
| LanStyleTTS VITS (IPA) | ||
| LanStyleTTS VITS (IPA + Style) | ||
| Acoustic Feature & Vocoder | ||
| MelSpec + WaveGlow | ||
| Latent + AE Decoder | ||