Skip to the content.

This website holds a handful of exemplary samples and our listening test instructions to give reviewers more details for our submission during the anonymous review process. With the open-source release of all code and models, we will also include an interactive demo that will replace this temporary static demo. Please do not visit the repository from which this demo page is hosted, as this could reveal author identities.

Audio Samples

Selection of High-Resource Languages

abundant data is available

Voice 1 - English Voice 2 - English
Voice 1 - German Voice 2 - German
Voice 1 - French Voice 2 - French
Voice 1 - Spanish Voice 2 - Spanish

Selection of Mid-Resource Languages

few data with mostly low quality is available

Voice 1 - Vietnamese Voice 2 - Vietnamese
Voice 1 - Welsh Voice 2 - Welsh

Selection of Low-Resource Languages

no data is available, zero-shot inference

Voice 1 - Breton Voice 2 - Breton
Voice 1 - Aymara Voice 2 - Aymara

Listening Test Instructions

After a welcoming page and a page with a test audio sample for headphone calibration, the participants of the listening test got the following instructions on how to rate the audio samples.

Vietnamese

For the Vietnamese test, we provided instructions in the participants’ native language.

Mẫu giọng nói nhân tạo dưới đây giống tiếng Việt của người bản ngữ đến mức nào? Vui lòng đánh giá theo thang điểm từ "rất không giống" (như người nước ngoài bắt chước tiếng Việt mà không thông qua đào tạo/rèn luyện) đến "rất giống" (như người bản ngữ).

Vui lòng tập trung vào cách phát âm của từ, nhịp điệu lời nói và ngữ điệu. Đánh giá dựa trên cảm giác chủ quan của bạn và bỏ qua các khía cạnh về chất lượng âm thanh và độ giống người thật của giọng nói.

Rất không giống / Không giống / Tạm giống / Giống / Rất giống

Breton

Since we expect native speakers of Breton to speak English, we provided the instructions in English.

How similar does the following artificially generated speech sample sound to someone speaking Breton as a native speaker? Please rate it on a scale from "bad similarity" (foreign speaker imitating Breton without any training) to "excellent similarity" (native speaker of Breton).

Please focus on the pronunciation of words, speech rhythm, and intonation. Base your rating on your subjective feeling, and ignore aspects relating to the audio fidelity or human-likeness of the audio quality.

Very Poor Similarity / Poor Similarity / Fair Similarity / Good Similarity / Excellent Similarity

Welsh

Since we expect native speakers of Welsh to speak English, we provided the instructions in English.

How similar does the following artificially generated speech sample sound to someone speaking Welsh as a native speaker? Please rate it on a scale from "bad similarity" (foreign speaker imitating Welsh without any training) to "excellent similarity" (native speaker of Welsh).

Please focus on the pronunciation of words, speech rhythm, and intonation. Base your rating on your subjective feeling, and ignore aspects relating to the audio fidelity or human-likeness of the audio quality.

Very Poor Similarity / Poor Similarity / Fair Similarity / Good Similarity / Excellent Similarity

Aymara

Since there is no widely used writing system in Aymara, and we expect native Aymara speakers to also speak Spanish, we provided the instructions in Spanish.

¿En qué medida se parece la siguiente muestra de habla generada artificialmente a alguien que habla aymara como nativo? Por favor, puntúela en una escala que va desde "Muy escasa similitud" (hablante extranjero que imita el aymara sin ninguna formación) a "Similitud excelente" (hablante nativo de aymara).

Concéntrese en la pronunciación de las palabras, el ritmo del habla y la entonación. Base tu valoración en sus sensaciones subjetivas e ignore los aspectos relacionados con la fidelidad del audio o su similitud con una voz humana.

Muy escasa similitud / Similitud escasa / Similitud regular / Similitud buena / Similitud excelente