This website holds a handful of exemplary samples and our listening test instructions to give reviewers more details for our submission during the anonymous review process. With the open-source release of all code and models, we will also include an interactive demo that will replace this temporary static demo. Please do not visit the repository from which this demo page is hosted, as this could reveal author identities.
Audio Samples
Selection of High-Resource Languages
abundant data is available
Voice 1 - English | Voice 2 - English | ||
Voice 1 - German | Voice 2 - German | ||
Voice 1 - French | Voice 2 - French | ||
Voice 1 - Spanish | Voice 2 - Spanish |
Selection of Mid-Resource Languages
few data with mostly low quality is available
Voice 1 - Vietnamese | Voice 2 - Vietnamese | ||
Voice 1 - Welsh | Voice 2 - Welsh |
Selection of Low-Resource Languages
no data is available, zero-shot inference
Voice 1 - Breton | Voice 2 - Breton | ||
Voice 1 - Aymara | Voice 2 - Aymara |
Listening Test Instructions
After a welcoming page and a page with a test audio sample for headphone calibration, the participants of the listening test got the following instructions on how to rate the audio samples.
Vietnamese
For the Vietnamese test, we provided instructions in the participants’ native language.
Mẫu giọng nói nhân tạo dưới đây giống tiếng Việt của người bản ngữ đến mức nào? Vui lòng đánh giá theo thang điểm từ "rất không giống" (như người nước ngoài bắt chước tiếng Việt mà không thông qua đào tạo/rèn luyện) đến "rất giống" (như người bản ngữ).
Vui lòng tập trung vào cách phát âm của từ, nhịp điệu lời nói và ngữ điệu. Đánh giá dựa trên cảm giác chủ quan của bạn và bỏ qua các khía cạnh về chất lượng âm thanh và độ giống người thật của giọng nói.
Rất không giống / Không giống / Tạm giống / Giống / Rất giống
Breton
Since we expect native speakers of Breton to speak English, we provided the instructions in English.
How similar does the following artificially generated speech sample sound to someone speaking Breton as a native speaker? Please rate it on a scale from "bad similarity" (foreign speaker imitating Breton without any training) to "excellent similarity" (native speaker of Breton).
Please focus on the pronunciation of words, speech rhythm, and intonation. Base your rating on your subjective feeling, and ignore aspects relating to the audio fidelity or human-likeness of the audio quality.
Very Poor Similarity / Poor Similarity / Fair Similarity / Good Similarity / Excellent Similarity
Welsh
Since we expect native speakers of Welsh to speak English, we provided the instructions in English.
How similar does the following artificially generated speech sample sound to someone speaking Welsh as a native speaker? Please rate it on a scale from "bad similarity" (foreign speaker imitating Welsh without any training) to "excellent similarity" (native speaker of Welsh).
Please focus on the pronunciation of words, speech rhythm, and intonation. Base your rating on your subjective feeling, and ignore aspects relating to the audio fidelity or human-likeness of the audio quality.
Very Poor Similarity / Poor Similarity / Fair Similarity / Good Similarity / Excellent Similarity
Aymara
Since there is no widely used writing system in Aymara, and we expect native Aymara speakers to also speak Spanish, we provided the instructions in Spanish.
¿En qué medida se parece la siguiente muestra de habla generada artificialmente a alguien que habla aymara como nativo? Por favor, puntúela en una escala que va desde "Muy escasa similitud" (hablante extranjero que imita el aymara sin ninguna formación) a "Similitud excelente" (hablante nativo de aymara).
Concéntrese en la pronunciación de las palabras, el ritmo del habla y la entonación. Base tu valoración en sus sensaciones subjetivas e ignore los aspectos relacionados con la fidelidad del audio o su similitud con una voz humana.
Muy escasa similitud / Similitud escasa / Similitud regular / Similitud buena / Similitud excelente