IVONA For Developers

Develop with IVONA Text-to-Speech.

6. SSML support

There is an text/ssml content type available in createSpeechFile method that allows us to import text in SSML format. Text should be correct according to the SSML 1.1 recomendation (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/) and will be validated with SSML 1.1 basic schema (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis) on createSpeechFile() method call. If there is a validation error, the ERR_INVALID_SSML error will be returned (with additional validation notes), and the Speech File won’t be created.

There is a possibility to omit the SSML header/footer when importing a text. IVONA TTS SaaS API detects the absence of them and add them before the validation step. So the following example will be still valid:

Alice was awaken by some strange noise...
<voice name="Kimberly">"Who's there?"</voice> she asked fearfully.
<prosody rate="75%"><voice name="Emma">"It's Sally here, don't be afraid. Sleep well!"</voice>
</prosody> Sally answered with her flegmatic voice.

When calculating characters price for a given text all SSML tags will be omitted, only the text that will be read counts. So it doesn’t matter if we use the full SSML header, or completely skip that part.

Changing voice using the <voice> element.

The <voice> element will accept the voice names (see table), not the voice id. For example: <voice name=”Kimberly”>What’s up?</voice>. The rest of the text (except from the parts where the voice is changed using the <voice> element), will be read using the “default” voice id, from the voiceId parameter of the createdSpeechFile() method.

All SSML elements in an imported text, with the exception for <audio> and <lexicon> tags, will be interpreted. The <audio> and <lexicon> elements will be completely ignored. The pronunciation rules should work, except if they make the ssml document invalid – in that case the Speech File won’t be created and validation error message will be returned.