The Ivona Text-to-Speech synthesizer is a versatile system that correctly transforms most written data into human-like, natural speech. The Ivona synthesizer operates on written, fully expanded words. However, input text documents contain not only full words, such as mleko and cukier, but also various other language units, such as numbers (15), dates (3/4/2003), acronyms (CBA), abbreviations (ul.), symbols ($), etc. All individual language units must be first consistently expanded into full words before they get synthesized. This conversion takes place internally within the synthesizer and is called text normalization.
The Polish Text-To-Speech voices correctly normalize and synthesize the majority of Polish texts. This document describes various text normalization processes that all written input data undergoes before being synthesized.
The text normalization processes can be extended by means of the Ivona regular expressions lexicon and by using PLS lexicons (W3C Recommendation) which are fully customizable by the end-user.
This section describes how unannotated input text is split into paragraphs, sentences and words.
Paragraphs are separated by empty lines.
Paragraphs may be explicitly marked with SSML elements p.
A sentence contains by default less than 1000 characters. Sentences longer than that will be broken into multiple smaller sentences.
Sentences may be explicitly marked with SSML elements s.
A word contains by default less than 100 characters. Words longer than that will be broken into multiple smaller words.
Words without any vowels will be spelled out.
Ivona accepts all Unicode characters. Ivona handles most characters found in texts based on the Latin script.
Punctuation plays a key role in the way texts are interpreted by the TTS system. Ivona supports majority of punctuation marks found in Polish texts. However, in the end all punctuation marks which have effect on pauses or intonation are mapped to the following marks.
rising or falling
This section describes in general how Ivona normalizes input text, excluding text fragments marked with the SSML say-as element.
This section is not exhaustive. Ivona normalizes various text units but only the most common ones have been included in this description.
A cardinal number is either any single digit (0, 1, …, 9) or a sequence of digit not starting with 0.
Longer cardinal numbers may make use of dot as a thousands separator.
10,000 will be pronounced dziesięć tysięcy.
256 will be pronounced dwieście pięćdziesiąt sześć.
4358 will be pronounced cztery tysiące trzysta pięćdziesiąt osiem.
1.000 will be pronounced tysiąc.
A signed integer consists of a sign character followed immediately by a cardinal number. Valid sign characters are the plus sign (+), the minus sign (−, U+2212) and the plus-minus sign (±). The popular hyphen-minus character (-), as well as other dash-like characters, are also supported as the sign character, but they are ambiguous and should best be avoided.
+5 will be pronounced plus pięć.
−3,000 will be pronounced minus trzy tysiące.
A cardinal or signed integer followed immediately by the comma and a sequence of digits will be recognized as a real number.
4,5 will be pronounced cztery przecinek pięć.
-3,1 will be pronounced minus trzy przecinek jeden.
1.000,12 will be pronounced tysiąc przecinek dwanaście.
A cardinal number with suffixes -i, -y, -sty is interpreted as an ordinal number. A cardinal number with suffixes st, nd, rd or th is interpreted as an ordinal number.
21-y will be pronounced dwudziesty pierwszy.
42-i will be pronounced czterdziesty drugi.
6-sty will be pronounced szósty.
Cardinal followed by any suffix will follow the same pattern as regular plural Polish words, examples below.
1-ym will be pronounced pierwszym.
100-ym will be pronounced setnym.
20-ą will be pronounced dwudziestą.
Ivona supports various Roman numerals.
All uppercase Roman numerals with an appropriate lowercase ordinal suffix are pronounced as ordinal numbers.
LI-y will be pronounced pięćdziesiąty pierwszy.
MMXI-ym will be pronounced dwa tysiące jedenastym.
Small uppercase and lowercase Roman numerals in other contexts will be pronounced as cardinal numbers.
Rozdział XIX will be pronounced rozdział dziewiętnasty.
Punkt L will be pronounced punkt piećdziesiąty.
II Wojna Światowa will be pronounced druga wojna światowa.
Jan Paweł II will be pronounced jan paweł drugi.
II Tura will be pronounced druga tura.
In the above mentioned and other context roman numbers up to XXXIX are pronounced as ordinal numbers except I, V, X.
xxiii will be pronounced dwadzieścia trzy.
viii will be pronounced osiem.
A fraction interpretation in Polish language depends on the context and it usually applies to digits, separated by comma, which are followed by units and measurements.
0,5 km will be pronounced pół kilometra
1,5 godz will be pronounced półtorej godziny.
0,2 sek will be pronounced zero i dwie dziesiąte sekundy.
In other cases a user can force fraction interpretation with say-as ssml tag. More details can be found in section 4.8 of this documentation.
Sequences of more than one digit starting with 0 are always read as a sequence of digits.
Similarily are handled digits in fixed formats, such as telephone numbers.
0123 will be pronounced zero jeden dwa trzy.
058 783 49 51 will be pronounced zero pięć osiem siedem osiem trzy czterdzieści dziewięć pięćdziesiąt jeden.
42 657-32-32 will be pronounced cztery dwa sześć pięć siedem trzydzieści dwa trzydzieści dwa.
Ivona handles a wide variety of commonly as well as rarely used units, including metric and imperial systems. Some unit symbols are always recognized, others need a preceding number.
10 ohm will be pronounced dziesięć omów.
-8 °C will be pronounced minus osiem stopni celcjusza.
15 dB will be pronounced piętnaście decybeli.
3 oz will be pronounced trzy uncje.
2,6 GHz will be pronounced dwa i sześć dziesiątych gigaherca.
25 ha will be pronounced dwadzieścia pięć hektarów.
100 Wh will be pronounced sto watogodzin.
2% will be pronounced dwa procent.
40 km/h will be pronounced czterdzieści kilometrów na godzinę.
Ivona supports a certain number of currencies in multiple formats. Valid currency symbols include commonly used symbols such as $, zł.
The number may be followed by the words milion, bilion, miliard, or their various cases or abbreviations. In this case the currency will be pronounced at the end.
The value may have a thousands separator which may be either a comma or a space.
$10 will be pronounced dziesięć dolarów.
50zł will be pronounced pięćdziesiat złotych.
100 EUR will be pronounced sto euro.
12 GBP will be pronounced dwanaście funtów.
$10 milionów will be pronounced dziesięć milionów dolarów.
Ivona supports time specified in 24-hour clock.
2:00 will be pronounced druga.
01:59 will be pronounced pierwsza pięćdziesiąt dziewięć.
13:00 will be pronounced trzynasta.
One-digit numbers for the day and for the month may have an optional leading zero.
Supported formats for month expressions: numbers (04), name (kwiecień), abbreviation (kwi).
The year can be expressed with 4 digits.
Standard US format (M/D/Y, M-D-Y, M.D.Y), default for American English voices:
12/31/2001 will be pronounced trzydziesty pierwszy grudnia dwa tysiące jeden.
gru/31/2001 will be pronounced trzydziesty pierwszy grudnia dwa tysiące jeden.
kwiecień-25-2001 will be pronounced dwudziesty piąty kwietnia dwa tysiące jeden.
European format (D/M/Y, D-M-Y, D.M.Y), default for European voices:
12/maj/2001 will be pronounced dwunasty maja dwa tysiące jeden.
12-gru-2007 will be pronounced dwunasty grudnia dwa tysiace siedem.
20.03.2011 will be pronounced dwudziesty marca dwa tysiące jedenaście.
ISO 8601 standard (Y-M-D, Y/M/D, Y.M.D), only 4-digit year:
2007/01/01 will be pronounced pierwszy stycznia dwa tysiące siedem.
2007-sty-01 will be pronounced pierwszy stycznia dwa tysiące siedem.
2007-styczeń-01 will be pronounced pierwszy stycznia dwa tysiące siedem.
Other common formats:
2 czerwiec will be pronounced drugi czerwiec.
Od 13:00 do 14:00 will be pronounced od trzynastej do czternastej.
czwartek, 10 marca will be pronoun czwartek dziesiąty marca.
A number will be read as a year if it is followed by p.n.e. or if it is preceded or followed by n.e.:
1023 p.n.e. will be pronounced _tysiąc dwadzieścia trzy przed nasza erą_.
Ivona interprets ranges of numbers separated with dashes.
3 – 5 will be pronounced trzy do pięć.
Most abbreviations will be expanded to full words. There will be no sentence break on the dot sign (full stop) following a supported abbreviation. In order to force a sentence break please use two dot signs: one to mark the abbreviation and one to mark the sentence ending.
np. will be interpreted as na przykład.
Ul. will be interpreted as ulica.
Initialisms with a period (dot) following each letter (e.g. U.S., F.B.I.) will be pronounced by spelling out each letter.
Most common initialisms without dots (e.g. PZU, PKO) will be also recognized as such and properly pronounced.
All vowelless words are recognized as initialisms.
HTML will be pronounced ha te em el.
raport IT will be pronounced raport aj ti.
NBP will be pronounced en be pe.
In most cases Ivona properly recognizes and normalizes street addresses.
Al. Grunwaldzka 472, 80-309 Gdańsk, Polska will be pronounced aleja grunwaldzka czterysta siedemdziesiąt dwa, osiemdziesiąt trzysta dziewięć gdańsk, polska.
Ivona recognizes most Polish telephone number formats and reads them as series of digits.
Tel.058 783 49 51 will be pronounced telefon zero pięć osiem siedem osiem trzy czterdzieści dziewięć pięćdziesiąt jeden.
42 657-32-32 will be pronounced cztery dwa sześć pięć siedem trzydzieści dwa trzydzieści dwa.
(0-42)320-12-67 will be pronounced zero czterdzieści dwa trzysta dwadzieścia dwanaście sześćdziesiąt siedem.
0 768 121 430 will be pronounced zero siedem sześć osiem jeden dwa jeden cztery trzy zero.
Non-words not described elsewhere will be treated as identifiers. This group includes mixes of letters and digits, such as r121, as well as e-mail addresses, or fancy proper names unknown to the synthesizer.
Punctuation characters within identifiers will be pronounced.
r125lp will be pronounced er sto dwadzieścia pięć el pe.
B!0 will be pronounced be wykrzyknik zero.
The SSML element say-as gives users the possibility to annotate fragments of text in order to force particular interpretation.
Marking a fragment with say-as disables most default normalization rules, which would have otherwise been applied. Therefore, it is advised to mark text with say-as scarcely, only when the default normalization rules fail and render different speech than expected by the user.
The standards authority W3C Working Group has issued a note SSML 1.0 say-as attribute values, which is mostly followed by Ivona.
Ivona will interpret a value as a date, when used within say-as with interpret-as="date". This works just as defined in the W3C note. The format attribute may be set to any of the following: mdy, dmy, ymd, md, dm, ym, my, d, m, y.
<say-as interpret-as="date" format="ymd">01/02/03</say-as> will be pronounced trzeci lutego dwa tysiące jeden.
Token like 7'10" would defaultly be recognized as length in feet and inches. However, it may be forced to be recognized as duration in minutes and seconds by surrounding with say-as having interpret-as="time".
<say-as interpret-as="time">2'10"</say-as> will be pronounced dwie minuty i dziesięć sekund.
Ivona will read individual characters for text within the say-as element having interpret-as="characters". The format attribute is ignored. The detail attribute may be used to force pauses, as described in the W3C Note.
<say-as interpret-as="characters">klasa</say-as> will be pronounced ka el a es a.
<say-as interpret-as="characters" detail="3 1 2">1a3BZ7</say-as> will be pronounced jeden a trzy, b, z siedem.
Ivona will attempt to read values within say-as having interpret-as="cardinal" as cardinal numbers. The format and detail attributes are ignored. Roman numerals are supported.
<say-as interpret-as="cardinal">CLI</say-as> will be pronounced sto pięćdziesiąt jeden.
Ivona will attempt to read values within say-as having interpret-as="ordinal" as ordinal numbers. The format and detail attributes are ignored. Roman numerals are supported.
<say-as interpret-as="ordinal">21</say-as> will be pronounced dwudziesty pierwszy.
<say-as interpret-as="ordinal">VI</say-as> will be pronounced szósty.
Ivona will interpret values within say-as having interpret-as="fraction" as common fractions.
<say-as interpret-as="fraction">2/9</say-as> will be pronounced dwie dziewiąte.
As mentioned at the very beginning of this text, it is sometimes necessary to modify texts to be synthesized in order to make them compatible with the system constraints and achieve the expected output. Ivona provides a set of special characters that work only in certain contexts, changing the way texts are being synthesized in terms of pronunciation or intonation. The characters are language-specific and do not apply to other languages unless specified otherwise in the language-specific documentation.
The stress can be adjusted with the ` character (backtick). It moves the accent on the following syllable. For example Fabryka is pronouced differently, depending on the voice. Adding ` unifies prononuciation for each voice
Fab`ryka will result in a stress forced to ry.
Polish diagraphs like ch, sz, cz, si, ci, rz can be interpreted as two separate letters by adding ' character (apostrophe) between them.
A question mark followed by caret also known as circumflex (?^) can be used to force the intonation of a question to rise. Some questions by default have falling intonation. This can be changed by appending a caret to the question mark.
Czy wszystko w porządzku?^ will result in a rising intonation.
A question mark followed by an underscore (?_) can be used to force the intonation of a question to fall. Yes/No questions by default have a rising intonation. This can be changed by appending the underscore character to the question mark.
Kupiłeś nowy samochód?_ will result in a falling intonation.