Baseline - No text embeddings used
The baseline TTS system generates speech directly from the given text input, without any additional conditioning prompts. The only way the prosody could be influenced here is the text that is to be read itself. This approach serves as a foundation for comparison.
Proposed - System conditioned on text embeddings
The proposed TTS system is conditioned on natural language prompts. The goal is to achieve more expressive and contextually appropriate speech output, if the text to be read is used as the prompt. Alternatively, a different text can be used as prompt to transfer the expected prosody of the prompt over to the text that is to be read.
Using the Input Text as Prompt
Emotion | Input Sentence | Baseline | Proposed |
---|---|---|---|
Anger | You can't be serious, how dare you not tell me you were going to marry her? | ||
Joy | I really enjoy the beach in the summer. | ||
Neutral | You can go to the Employment Development Office and pick it up. | ||
Sadness | Lily broke up with me last week, in fact, she dumped me. | ||
Surprise | He was astonished when he saw them come alone, and asked what had happened to them. |
Using a different Prompt
Emotion | Prompt | Input Sentence | Proposed | |
---|---|---|---|---|
Anger | You can't be serious, how dare you not tell me you were going to marry her? | Lily broke up with me last week, in fact, she dumped me. | ||
Joy | I really enjoy the beach in the summer. | You can go to the Employment Development Office and pick it up. | ||
Neutral | You can go to the Employment Development Office and pick it up. | You can't be serious, how dare you not tell me you were going to marry her? | ||
Sadness | Lily broke up with me last week, in fact, she dumped me. | He was astonished when he saw them come alone, and asked what had happened to them. | ||
Surprise | He was astonished when he saw them come alone, and asked what had happened to them. | I really enjoy the beach in the summer. |