2. System concepts
- Account
- Account of the IVONA.com registered user. Having an account with an IVONA TTS SaaS service active is required for the use of API. The registration process (creation of new accounts) isn’t available through API at the moment. New accounts could be created at https://secure.ivona.com/account/register.php (the registration page on the IVONA website). Each account is identified by a string: email. Additionally SpeechCloud service uses API Key, that could be generated at: https://secure.ivona.com/account/apikey.php and alongside with email is used in the request authorization process.
- Speech File
- Sound file generated in the text-to-speech process of IVONA TTS SaaS from the UTF-8 encoded text supported by user. In addition to the text, the speech file is generated according to additional supported parameters: the voice which will read the text, the codec that will determine the output format and quality of sound, and additional sound parameters that will modify the speech in the desired way (change the speed or volume of it, modify the sound parameters or set ID3 tags in case of MP3 files). All speech file data is stored in the database and could be accessed only by its owner. The speech file is identified by an unique file identifier. The downloading of a speech file will result in decreasing the number of characters available in the active user account’s SaaS service.
- Text
- The text uploaded by user using createSpeechFile() method. The text should be UTF-8 encoded, and its MIME-type should be selected from the list of available content-types. The text is stored in the IVONA.com website database and could be accessed and deleted only by its owner (the uploader).
| content type | description |
|---|---|
|
text/plain |
The text will parsed by pronunciation rules, and then will be read as is. |
|
text/html |
The text will be converted from HTML to plain text (all tags will be removed, or replaced by pauses, making the text suitable for reading). After the conversion is completed the pronunciation rules will be applied. |
|
text/ssml |
The text will be interpreted as SSML 1.1, and validated with SSML 1.1 basic schema (http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis) All SSML elements, except <audio> and <lexicon> will be interpreted – and those two will be ignored. The pronunciation rules should work, except if they make the ssml document invalid. |
- Voice
- Single Text To Speech synthesiser selected to process the text. There could be only one voice selected for a single speech file. The voice is identified by a voice identifier parameter. Currently there are following voices available:
| voice id | voice name | voice language | voice gender |
|---|---|---|---|
|
us_eric |
Eric |
American English |
male |
|
us_jennifer |
Jennifer |
American English |
female |
|
us_joey |
Joey |
American English |
male |
|
us_kendra |
Kendra |
American English |
female |
|
us_kimberly |
Kimberly |
American English |
female |
|
us_chipmunk |
Chipmunk |
American English |
none |
|
us_salli |
Salli |
American English |
female (teenager) |
|
us_ivy |
Ivy |
American English |
female (child) |
|
au_nicole |
Nicole |
Australian English |
female |
|
es_us_penelope |
Penelope |
American Spanish |
female |
|
es_us_miguel |
Miguel |
American Spanish |
male |
|
gb_amy |
Amy |
British English |
female |
|
gb_brian |
Brian |
British English |
male |
|
gb_emma |
Emma |
British English |
female |
|
en_wls_geraint |
Geraint |
Welsh English |
male |
|
en_wls_gwyneth |
Gwyneth |
Welsh English |
female |
|
cy_geraint |
Geraint |
Welsh |
male |
|
cy_gwyneth |
Gwyneth |
Welsh |
female |
|
de_marlene |
Marlene |
German |
female |
|
de_hans |
Hans |
German |
male |
|
es_conchita |
Conchita |
Castilian Spanish |
female |
|
es_enrique |
Enrique |
Castilian Spanish |
male |
|
fr_mathieu |
Mathieu |
French |
male |
|
fr_celine |
Celine |
French |
female |
|
pl_ewa |
Ewa |
Polish |
female |
|
pl_jacek |
Jacek |
Polish |
male |
|
pl_jan |
Jan |
Polish |
male |
|
pl_maja |
Maja |
Polish |
female |
|
ro_carmen |
Carmen |
Romanian |
female |
- Pronunciation Rules
- Table of rules (simple text substitutions and regular expression substitutions) intended for preprocessing the uploaded texts before they would be processed (synthesised) by voice. The main reason for using the pronunciation rules is to improve the pronunciation of specific words which are read by selected voice in a way different from the intended one (especially abbreviations, foreign words, etc.), or to remove parts of texts (specific sections, symbols, etc.) which shouldn’t be heard in a spoken text. There are two types of pronunciation rules: the internal pronunciation rules that are a part of IVONA TTS SaaS (supporting the pronunciation of most popular abbreviation, foreign names, and specific grammatical constructions) and are used always on the uploaded text, and user pronunciation rules that could be inserted by user and will be visible only to their owner and IVONA TTS SaaS engine. All pronunciation rules are assigned to the specific language. In the process of generating the speech file, during the usage of a voice that is intended to work in a specific language (for example Brian in English), user pronunciation rules created for such language will be used automatically BEFORE the internal pronunciation rules. The character price of the single download of the speech file is determined AFTER processing the file with the pronunciation rules. Pronunciation rules are divided into following languages in which voices are available:
| language id | language name | list of voices assigned |
|---|---|---|
|
en |
English |
us_chipmunk, us_jennifer, us_eric, us_kendra, us_joey, us_kimberly, us_salli, us_ivy, gb_amy, gb_brian, gb_emma, au_nicole, en_wls_geraint, en_wls_gwyneth |
|
pl |
Polish |
pl_ewa, pl_maja, pl_jacek, pl_jan |
|
ro |
Romanian |
ro_carmen |
|
de |
German |
de_hans, de_marlene |
|
es |
Spanish |
es_conchita, es_enrique, es_us_miguel, es_us_penelope |
|
fr |
French |
fr_celine, fr_mathieu |
|
cy |
Welsh |
cy_geraint, cy_gwyneth |
- Codec
- The name of audio codec used in the process of generating the speech file. Tha encoder name is supported amongst the parameters of the createSpeechFile() method. There are several codecs currently available to use through the API:
| codec id | codec description |
|---|---|
|
mp3/22050 |
MP3, 64 kbit/s, 22.05 kHz |
|
ogg/22050 |
OGG, 45 kbit/s, 22.05 kHz |
|
pcm16/22050* |
Uncompressed wav file, 16 bit, 22.05 kHz |
|
pcm16/8000* |
Uncompressed wav file, 16 bit, 8 kHz |
|
alaw/8000* |
Wav companded with A-law algorithm (for telecom purposes) |
|
ulaw/8000* |
Wav companded with µ-law algorithm (for telecom purposes) |
(*) Non-streamable formats are available on demand – contact: sales@ivona.com
- Sound file parameters
- Parameters affecting the format of the speech file. Those parameters could for example change the audio speed, volume, pitch and other sound properties. They could also set specific values for the ID3v2 tags of a file. All parameters are optional and have default values set by IVONA TTS SaaS. The list of available parameters is constantly growing, and new ones will be available in the future. Currently there are following parameters available:
|
BASIC PARAMETERS |
||||
|
parameter name |
parameter description |
parameter value range |
default value |
additional info |
|
Prosody-Volume |
the volume of the recording in percentage of original volume of the voice |
0-100 |
100 |
this parameter will change only the default volume used in the sound encoding process; it could be further changed by a sound player or device where the file will be installed |
|
Prosody-Rate |
the speed of the recording in percentage of the original speed of the voice |
50-200 |
100 |
this parameter could be useful in the solutions directed at the visually impaired people (accustomed to the higher speed of provided speech) or for the foreign language learning solutions (slower speed will suit those solutions better) |
|
Sentence-Break |
the pause between sentences in milliseconds |
0-3000 |
400 |
this parameter could be useful in the solutions intended to dictate texts to their receivers |
|
Paragraph-Break |
the pause between paragraphs (separated by empty lines in the uploaded text) in milliseconds |
0-5000 |
650 |
this parameter could be useful in solutions based on splitting speech into separated blocks |
|
ID3v2 TAGS SET FOR MP3 FILES |
||||
|
parameter name |
parameter description |
interpreted by IVONA.com Flash Player? |
default value (if not set by user) |
value example |
|
Id3v2-TIT2 |
Frame TIT2 in ID3v2.4 |
yes (will show the name of a file in modes 1 and 2 of the player) |
- |
my speech file |
|
Id3v2-TPE1 |
Frame TPE1 in ID3v2.4 |
yes (will show the author of a file in modes 1 and 2 of the player) |
www.ivona.com |
John Smith |
|
Id3v2-TPE3 |
Frame TPE3 in ID3v2.4 |
yes (will link the name of a file in modes 1 and 2 of the player) |
- |
|
|
Id3v2-TPE4 |
Frame TPE4 in ID3v2.4 |
yes (will show the image assigned to the file in modes 1 and 2 of the player) |
- |
|
|
Id3v2-TDTG |
Frame TDTG in ID3v2.4 |
no |
(the time of file encoding) |
2010-02-01T12:00:05 |
- Sound effects
- Additional sound effects could be added on special request. Contact us at sales@ivona.com, for separate agreement on creating a modified voice.
- Characters price
- The “price” of downloading a file deducted from user’s account. When user activates an IVONA TTS SaaS service on his account specific number of characters are added to his account. The number of characters added depends on the type of agreement the user has signed with the IVONA.com sales department (in case of trial services this number is standarized (see http://www.ivona.com/saas.php for details). For each download of a speech file the number of characters calculated by the IVONA TTS SaaS is deducted from the user’s account. This price depends on the size of the text uploaded by user after processing it with the pronunciation rules. User could always check the price of a specific text using the checkPrice() API method. Every consecutive download of a speech file will deduct the character price of this file from user’s account.