The Ivona Text-to-Speech synthesizer is a versatile system that correctly transforms most written data into human-like, natural speech. The Ivona synthesizer operates on written, fully expanded words. However, input text documents contain not only full words, such as mjölk and socker, but also various other language units, such as numbers (15), dates (3/4/2003), acronyms (USA), abbreviations (t.ex.), symbols ($), etc. All individual language units must be first consistently expanded into full words before they get synthesized. This conversion takes place internally within the synthesizer and is called text normalization.
The Swedish Ivona Text-To-Speech voices correctly normalize and synthesize the majority of Swedish texts. This document describes various text normalization processes that all written input data undergoes before being synthesized.
The text normalization processes can be extended by means of the Ivona regular expressions lexicon (described in a separate document) and by using PLS lexicons (W3C Recommendation) which are fully customizable by the end-user.
This section describes how unannotated input text is split into paragraphs, sentences and words.
Paragraphs are separated by empty lines.
Paragraphs may be explicitly marked with SSML elements p.
A sentence contains by default less than 1000 characters. Sentences longer than that will be broken into multiple smaller sentences.
Sentences may be explicitly marked with SSML elements s.
A word contains by default less than 100 characters. Words longer than that will be broken into multiple smaller words.
Words without any vowels will be spelled out.
Ivona will properly handle words with colons, such as the standard declension suffixes (:a, :ar, :arna, :en, etc.) or numeral suffixes.
Ivona accepts all Unicode characters. Ivona handles most characters found in texts based on the Latin script.
Punctuation plays a key role in the way texts are interpreted by the TTS system. Ivona supports majority of punctuation marks found in Swedish texts. However, in the end all punctuation marks which have effect on pauses or intonation are mapped to the following marks.
rising or falling
This section describes in general how Ivona normalizes input text, excluding text fragments marked with the SSML say-as element.
This section is not exhaustive. Ivona normalizes various text units but only the most common ones have been included in this description.
A cardinal number is either any single digit (0, 1, …, 9) or a sequence of digit not starting with 0.
Longer cardinal numbers may make use of a dot as a thousands separator.
10.000 will be pronounced tio-tusen.
256 will be pronounced två-hundra-femtiosex.
4358 will be pronounced fyra-tusen-tre-hundra-femtioåtta.
1.000 will be pronounced ett-tusen.
A signed integer consists of a sign character followed immediately by a cardinal number. Valid sign characters are the plus sign (+), the minus sign (−, U+2212) and the plus-minus sign (±). The popular hyphen-minus character (-), as well as other dash-like characters, are also supported as the sign character, but they are ambiguous and should best be avoided.
+5 will be pronounced plus fem.
−3,000 will be pronounced minus tre-tusen.
A cardinal or signed integer followed immediately by a comma and a sequence of digits will be recognized as a real number.
4,5 will be pronounced fyra komma fem.
-3,1 will be pronounced minus tre komma en.
1.000,12 will be pronounced ett-tusen komma tolv.
A cardinal number with suffixes a, :a, e or :e is interpreted as an ordinal number of the given gender. The :dra, :dje and :de ordinal suffixes are also supported in numerals that end with 2, 3 or 4 respectively.
21a will be pronounced tjugo-första.
42:a will be pronounced fyrtio-andra.
6e will be pronounced sjätte.
1.000.000:e will be pronounced en miljonte.
62:dra will be pronounced as sextio-andra.
3:dje will be pronounced as tredje.
44:de will be pronounced as fyrtio-fjärde.
Ivona supports various Roman numerals.
All uppercase Roman numerals with an appropriate common ordinal suffix separated with colon are pronounced as ordinal numbers.
LI:e will be pronounced femtio-förste.
XXVIII:a will be pronounced tjugoåtta.
Uppercase Roman numerals in names of monarchs will be read as ordinal numbers preceded with the word den.
kejsarinnan Katarina I will be pronounced kejsarinnan katarina den första.
Gustav I:e will be pronounced gustav den förste.
Small uppercase and lowercase Roman numerals in other contexts will be pronounced as cardinal numbers.
Punkt IX will be pronounced punkt nio.
II Världskriget will be pronounced andra världskriget.
xxii will be pronounced tjugotvå.
A fraction consists of the following elements in order:
An optional sign character.
An optional whole number (cardinal) followed by the space character.
The numerator (a cardinal number).
The slash (/ U+002F) or the solidus character (⁄ U+2044).
The denominator (a cardinal number)
Fractions with the slash character are recognized only for the most common denominators. Fractions with the solidus character are always correctly recognized.
3/4 will be pronounced tre fjärdedelar.
2 1/2 will be pronounced två och en halv.
−7 2/3 will be pronounced minus sju och två tredjedelar.
15⁄5678 (solidus only) will be pronounced femton fem-tusen-sex-hundra-sjuttioåttondelar.
Sequences of more than one digit starting with 0 are always read as a sequence of digits.
0123 will be pronounced noll ett två tre.
Ivona handles a wide variety of commonly as well as rarely used units, including metric and imperial systems. Some unit symbols are always recognized, others need a preceding number.
14'5" will be pronounced fjorton fot fem tum.
1h2m30s will be pronounced en timme två minuter trettio sekunder.
5 tsp will be pronounced fem teskedar.
1 tbsp will be pronounced en matsked.
2,6 GHz will be pronounced två komma sex gigahertz.
25 km/h will be pronounced tjugofem kilometer i timmen.
8 nmi will be pronounced åtta nautiska mil.
-0,01% will be pronounced minus noll komma noll ett procent.
90° will be pronounced nittio grader.
Ivona supports a certain number of currencies in multiple formats. Valid currency symbols include commonly used symbols such as £, $, €, ¥, ₩, $AU, SG$, as well as many of the ISO 4217 currency codes (uppercase only).
The number may be followed by the words miljon, miljoner, miljard, miljarder, biljon, biljoner or their abbreviations. In this case the currency will be pronounced at the end.
The value may have a thousands separator which may be either a dot or a space.
50€ will be pronounced femtio euro.
EUR5.27 will be pronounced fem euro och tjugosju cent.
$10 will be pronounced tio dollar.
£5,27 will be pronounced fem pund och tjugosju pence.
GBP 1.000 will be pronounced ett tusen brittiska pund.
¥1 miljon will be pronounced en miljon yen.
¥5,27 will be pronounced fem yen och tjugosju sen.
CHF6M will be pronounced sex miljoner schweiziska franc.
€ 20 000 will be pronounced tjugo-tusen euro.
C$ 2,3 mn will be pronounced två komma tre miljoner kanadensiska dollar.
Ivona supports time specified in both the 12-hour and the 24-hour clock.
1:59 will be pronounced ett och femtionio.
2:00 will be pronounced två noll noll.
01:59am will be pronounced noll ett femtionio a m.
2 AM will be pronounced två a m.
13:00 will be pronounced tretton noll noll.
10:25:30 will be pronounced tio tjugofem och trettio.
07:53:10 A.M. will be pronounced noll sju femtiotre och tio a m.
Ivona also handles duration specified in multiple formats.
5'30" (only for seconds greater than 11) will be pronounced fem minuter trettio sekunder.
5m30s will be pronounced fem minuter trettio sekunder.
3h10m will be pronounced tre timmar tio minuter.
1t30m25s will be pronounced en timme trettio minuter tjugofem sekunder.
One-digit numbers for the day and for the month may have an optional leading zero.
Supported formats for month expressions: numbers (4, 04), name (April), abbreviation (Apr).
The year should be expressed with 4 digits.
European format (D/M/Y, D-M-Y, D.M.Y), default for Swedish voices:
12/maj/1995 will be pronounced tolfte maj nitton-hundra-nittiofem.
12-Apr-2007 will be pronounced tolfte april två-tusen-sju.
20.3.2011 will be pronounced tjugonde mars två-tusen-elva.
US format (M/D/Y, M-D-Y) with month name:
Dec/31/1999 will be pronounced trettio första december nitton-hundra-nittionio.
April-25-1999 will be pronounced tjugo femte april nitton-hundra-nittionio.
ISO 8601 standard (Y-M-D, Y/M/D, Y.M.D), only 4-digit year:
2007/01/01 will be pronounced två-tusen-sju noll ett noll ett.
2007-Jan-01 will be pronounced första januari två-tusen-sju.
2007-Januari-01 will be pronounced första januari två-tusen-sju.
Other common formats:
01/6 -97 will be pronounced första i sjätte nittiosju.
Ivona interprets ranges of time and date.
15-20 April will be pronounced femtonde till tjugonde april.
1939-1945 will be pronounced nitton-hundra-trettionio till nitton-hundra-fyrtiofem.
Most abbreviations will be expanded to full words. There will be no sentence break on the dot sign (full stop) following a supported abbreviation. In order to force a sentence break please use two dot signs: one to mark the abbreviation and one to mark the sentence ending.
Kiruna centrum ligger 550 m ö.h. will be interpreted as kiruna centrum ligger fem-hundra-femtio meter över havet.
Initialisms with a period (dot) following each letter (e.g. E.U., H.D.M.I.) will be pronounced by spelling out each letter.
Most common initialisms without dots (e.g. EU, HDMI) will be also recognized as such and properly pronounced.
All vowelless words are recognized as initialisms.
H.D.M.I. will be pronounced h d m i.
i EU will be pronounced i e u.
IT-branschen will be pronounced i t branschen.
SVT will be pronounced s v t.
pwq will be pronounced p w q.
In most cases Ivona properly recognizes and normalizes street addresses in Sweden.
Stockholm University, SE-10691 Stockholm, Sweden will be pronounced stockholm university, s e ett hundra sex nittioett stockholm, sweden.
Ivona recognizes most Swedish telephone number formats and groups digits in pairs or triplets.
08-501 361 01 will be pronounced as noll åtta fem-hundra-ett tre-hundra-sextioett noll ett.
telefon: 123456 will be pronounced as telefon, tolv trettiofyra femtiosex.
Non-words not described elsewhere will be treated as identifiers. This group includes mixes of letters and digits, such as r121, as well as URL’s, e-mail addresses, or fancy proper names unknown to the synthesizer.
Numbers within identifiers such as r121, x01, b987654 will be read as numbers if they consist of up to 4 digits, and will be read as a series of digits otherwise.
Punctuation characters within identifiers will be pronounced.
er125lp will be pronounced er ett-hundra-tjugofem l p.
http://www.ivona.com will be pronounced h t t p kolon snedstreck snedstreck w w w punkt ivona punkt com.
B!0 will be pronounced b utropstecken noll.
The SSML element say-as gives users the possibility to annotate fragments of text in order to force particular interpretation.
Marking a fragment with say-as disables most default normalization rules, which would have otherwise been applied. Therefore, it is advised to mark text with say-as scarcely, only when the default normalization rules fail and render different speech than expected by the user.
The standards authority W3C Working Group has issued a note SSML 1.0 say-as attribute values, which is mostly followed by Ivona.
Ivona will interpret a value as a date, when used within say-as with interpret-as="date". This works just as defined in the W3C note. The format attribute may be set to any of the following: mdy, dmy, ymd, md, dm, ym, my, d, m, y.
<say-as interpret-as="date" format="ymd">01/02/03</say-as> will be pronounced tredje i andra noll ett.
<say-as interpret-as="date" format="y">1234</say-as> will be pronounced tolv-hundra-trettiofyra.
A token like 7'10" would defaultly be recognized as length in feet and inches. However, it may be forced to be recognized as duration in minutes and seconds by surrounding with say-as having interpret-as="time".
<say-as interpret-as="time">2'10"</say-as> will be pronounced två minuter och tio sekunder.
Telephone numbers may be marked with the say-as element having interpret-as="telephone". Digits in a telephone number are grouped in pairs or triplets.
<say-as interpret-as="telephone">1-800-555234</say-as> will be pronounced ett åtta-hundra femtiofem femtiotvå trettiofyra.
Ivona will read individual characters for text within the say-as element having interpret-as="characters". The format attribute is ignored. The detail attribute may be used to force pauses, as described in the W3C Note.
<say-as interpret-as="characters">speed</say-as> will be pronounced s p e e d.
<say-as interpret-as="characters" detail="3 1 2">1a3BZ7</say-as> will be pronounced ett a tre b z sju.
Ivona will attempt to read values within say-as having interpret-as="cardinal" as cardinal numbers. The format and detail attributes are ignored. Roman numerals are supported.
<say-as interpret-as="cardinal">1999</say-as> will be pronounced ett-tusen-nio-hundra-nittionio.
<say-as interpret-as="cardinal">CLI</say-as> will be pronounced ett-hundra-femtioen.
Ivona will attempt to read values within say-as having interpret-as="ordinal" as ordinal numbers. The format and detail attributes are ignored. Roman numerals are supported.
<say-as interpret-as="ordinal">21</say-as> will be pronounced tjugo-första.
<say-as interpret-as="ordinal">VI</say-as> will be pronounced sjätte.
Ivona will interpret values within say-as having interpret-as="fraction" as common fractions. The syntax for fractions is any of the following:
["+" | "−" | "±"] cardinal "/" cardinal.
["+" | "±"] cardinal "+" cardinal "/" cardinal.
"−" cardinal "−" cardinal "/" cardinal.
where cardinal is a number as defined in Cardinal numbers above.
<say-as interpret-as="fraction">2/9</say-as> will be pronounced två niondelar.
<say-as interpret-as="fraction">3+1/2</say-as> will be pronounced tre och en halv.
<say-as interpret-as="fraction">−2−3/8</say-as> will be pronounced minus två och tre åttondelar.
Measurements may be marked with say-as having interpret-as="unit" (or interpret-as="measure"). The valid syntax is the following:
symbol [ "2" | "3" | "4" | "²" | "³" ] [ "/" unit ]
number "-" unit
A unit symbol may be almost any of the standard metric, imperial or other unit symbols, e.g. N (newtons), kJ (kilojoules), mi (miles), sqft (square feet), MiB (mebibytes), ly (light years), tbsp (tablespoons), °F (degrees Fahrenheit), psi (pounds per square inch), etc. The unit name does not contain periods (dots). In general the unit symbols are case sensitive, so B is bytes and b is bits, but unambiguous symbols are matched case-insensitively, so that either the proper Hz or improper hz, HZ and hZ will all be treated as the frequency unit hertz.
The SI prefixes as well as binary prefixes may be prepended to unit symbols, if appropriate.
A unit symbol may be suffixed with a power like 2 or ³, so that m² is square meters and s³ is seconds cubed.
<say-as interpret-as="unit">2nmi</say-as> will be pronounced två nautiska mil.
<say-as interpret-as="unit">1+1/2tsp</say-as> will be pronounced en och en halv tesked.
<say-as interpret-as="unit">5m/s2</say-as> will be pronounced fem meterper kvadratsekund.
<say-as interpret-as="unit">2,100rpm</say-as> will be pronounced två-tusen-ett-hundra varv per minut.
<say-as interpret-as="unit">2,7µF</say-as> will be pronounced två komma sju microfarad.
Street addresses or parts of an address may be marked with say-as having interpret-as="address". This will force special pronunciation of Swedish postal codes (grouping them into three plus two digits).
<say-as interpret-as="address">Alphyddevägen 55, 13135 Nacka</say-as> will be pronounced alphyddevägen femtiofem, ett-hundra-trettioett trettiofem nacka.
The role attribute of w and token elements in an SSML document may be used to choose particular pronunciation of homographs. The possible values of this attribute are the following:
ivona:ABBR — Interpret the word as an abbreviation,
ivona:ACR — Interpret the word as an acronym,
ivona:AJ — Interpret the word as an adjective,
ivona:AV — Interpret the word as an adverb,
ivona:CJ — Interpret the word as a conjunction,
ivona:DT — Interpret the word as a determiner.
ivona:FW — Interpret the word as a foreign word,
ivona:IE — Interpret the word as an infinitive marker.
ivona:NM — Interpret the word as a number,
ivona:NMO — Interpret the word as an ordinal number,
ivona:NN — Interpret the word as a noun.
ivona:NNP — Interpret the word as a proper noun,
ivona:PN — Interpret the word as a pronoun,
ivona:PP — Interpret the word as a preposition,
ivona:RBR — Interpret the word as an interrogative word,
ivona:RP — Interpret the word as a particle,
ivona:VB — Interpret the word as a verb.
ivona:VBD — Interpret the word as a participle,
ivona:DEFAULT — Use the default sense of the word.
ivona:SENSE_1, ivona:SENSE_2, ivona:SENSE_3, ivona:SENSE_4 — Use the non-default sense of the word, which has a different pronunciation.
In most cases, however, Ivona properly chooses the pronunciation of an ambiguous word, and it doesn’t need to be explicitly marked.
<w role="ivona:NN">man</w> will be pronounced /ˈman/, as in Där går två män och en kvinna.
<w role="ivona:SENSE_1">man</w> will be pronounced /ˈmɑːn/, as in Hästens man glänste i solen.
<w role="ivona:NN">sky</w> will be pronounced /ɧyː/.
<w role="ivona:FW">sky</w> will be pronounced /skaɪ/.
As mentioned at the very beginning of this text, it is sometimes necessary to modify texts to be synthesized in order to make them compatible with the system constraints and achieve the expected output. Ivona provides a set of special characters that work only in certain contexts, changing the way texts are being synthesized in terms of pronunciation or intonation. The characters are language-specific and do not apply to other languages unless specified otherwise in the language-specific documentation.
A question mark followed by caret also known as circumflex (?^) can be used to force the intonation of a question to rise. Wh-questions (questions starting with an interrogative pronoun) by default have falling intonation. This can be changed by appending a caret to the question mark.
Hur mår du?^ will result in a rising intonation.
A question mark followed by an underscore (?_) can be used to force the intonation of a question to fall. Yes/No questions by default have a rising intonation. This can be changed by appending the underscore character to the question mark.
Är du okej?_ will result in a falling intonation.