Jump to content


< Back to Forum


 

TTS - Internal Or External


  • Please log in to reply

#1 Guest_Hugh_* 09 July 2003 - 09:22 AM

I need to design an application and will base this around VG, there is a requirement to do a tts conversion, at present I will either use ATT Natural Voice, or am looking at Rhetorical (more expensive, but clearer voice). It strikes me I have a choice, there will be either a VB or VC script gathering the text to recite, this can either be left for VG to pick up, in which case I need the enterprise version and link in the chosen tts engine.
Or I get the VB or VC script to get the text, generate a wav file using the tts engine and then get VG to pick up the appropriate wav file to play.

Are there any issues / advantages/disadvantages of either approach? my main concern would be speed, I do not want to delay or holdup the call whilst the tts is processed, it may be possible to get the text and pre-process the wav file before the call is received, but then again, this might not be an issue.

Suggestions and reasons please.
Thanks
Hugh

#2 SupportTeam 09 July 2003 - 11:19 AM

I guess that when using external programs to generate the TTS wav files the overall time it would take before the TTS can be spoken would be longer (unless you pre-process of course)...

You can experiment using the evaluation version of VoiceGuide - the TTS is enabled in the evaluation version...
Can you say what pricing you are getting on both engines? There have been a few pricing questions from other readers before...

#3 SupportTeam 09 July 2003 - 01:41 PM

I was just told that the AT&T Natural Voices TTS engine costs US$35 and can be purchased from:

https://www.regsoft....productid=56116


I understand that AT&T Natural Voices is regarded as the leader at this stage although I must say the Rhetorical samples did sound good when I just listened to them on their web site ( http://www.rhetorical.com/voices.html ) - mind you they were pre-recorded so it's pretty hard to compare them head-to-head with the AT&T sample page ( http://elvis.naturalvoices.com/demos/ ) which allows you to type in what you want spoken...

#4 Guest_Hugh_* 10 July 2003 - 01:15 AM

Pricing of speech engines seams to change depending on the application, how you present it and who you speak to. The AT&T engine with 16khz Audrey (UK Voice) is about 100+ including the Read Please application. This would appear to only be available via mail order (you cannot download the higher quality voices from the website). For what you get it is good value and the quality is good but can be tricked depending on the text being fed into it.

Rhetorical is the best from what I have viewed so far, this is less of an app (like Read Please) and more of an SDK, I have not yet fully evaluated the SDK, but Rhetorical as a company have been extremely helpful and responsive. Pricing seems to be a sensitive issue here and there are many license options which one tends to think have more benefit to the sales persons commission than the customers pocket. The same appears to be true of Scansoft and their engine (originally L&H). Both seem to quote around the 750 Euro per line and are clearly licensed to be used in a live environment.

That being the case, pricing is easy, but my application is such that I know who is going to call, before they call and could therefore pre-process text into a wav file. That way I may have redundant files (the person does not call), but I would only need to process once and since this could be done sequentially, I would technically only be using one license. The response to this though is that you need a server license which costs significantly more and for a small number of lines (2-4) it is more cost effective to work live.

Has anyone integrated the Rhetorical engine into VG or has any experience of this?

#5 SupportTeam 10 July 2003 - 02:34 AM

Have you checked out the link from the previous post:

https://www.regsoft....productid=56116

You can purchase the 16kHz AT&T engine there (Mike+Crystal voices) at US$35+shipping...
Audrey (16khz UK English) Voice is also available there for an additional US$35+shipping...

After installing them on the PC and selecting the AT&T voice as the preferred voice using the "Speech" applet in the Control panel you can use them from VoiceGuide...

I understand that as they are used by one app and requests are sequential (VoiceGuide takes very little time to generate TTS Wav file, so TTS generation is done sequentially) then you only need one license...

At US$70 for Mike+Crystal+Audrey voices that makes it a lot cheaper then Rhetorical...

#6 Guest_Hugh_* 10 July 2003 - 08:52 AM

I have tried linking the AT&T 16k Audrey voice to VG and it works, but does introduce a considerable delay when the call is first answered to the first text being spoken. I am using a single step in the script which contains 2 sentances of text (about 30-40 seconds worth). Admittadly, the machine this is being tested on is not very powerful (PII 333), but the delay is almost 20 seconds between the call being answered and the speech starting. Is this normal, or is there something wrong (apart from needing a faster platform)?

#7 SupportTeam 10 July 2003 - 01:09 PM

I'd say that the reason for the slow response time was the machine used.. how much memory did it have - judging by the time take there must have been some serious disk swapping...
AT&T as a minimum asks for a machine with 256MB RAM

If you use an appropriate machine then you get very good response times.
Keep in mind that at the very first call into the machine after starting VoiceGuide the first TTS action will take a bit longer as it takes some time to load all the libraries into memory for the first time.

Here is a test of AT&T TTS I just did on a 866MHz PIII with 384MB RAM (a low end Compaq Deskpro EN).

The Script answers the call and plays:
QUOTE
Welcome to Katalina Technologies, Please press 1 for the sales department, 2 for accounts.
If you have a technical support questions please see the knowledge base section on our web page,
or post a question on our support forum which is regularly monitored by our support staff.
Alternatively you may also press 3 to leave us a voicemail message, but please be sure to include your
email address in the message as any pre-sales support questions are answered by email only.

resulting sound file is 27 seconds long.


On the very first call it did take 7 seconds to create the sound file and start playing it:

153501.69 7 [Play TTS test] Playing
153501.69 7 tts generate start[...]
153508.98 7 tts generate end
153508.98 7 [Play TTS test] Playing (none, C:\vg\system\tts7.wav)


but on the second and following calls it took less then one second to generate a 27 second sound file and start playing it:

153545.94 7 [Play TTS test] Playing
153545.94 7 tts generate start[...]
153546.84 7 tts generate end
153546.84 7 [Play TTS test] Playing (none, C:\vg\system\tts7.wav)

153622.95 7 [Play TTS test] Playing
153622.95 7 tts generate start[...]
153623.86 7 tts generate end
153623.86 7 [Play TTS test] Playing (none, C:\vg\system\tts7.wav)

153659.69 7 [Play TTS test] Playing
153659.69 7 tts generate start[...]
153700.56 7 tts generate end
153700.56 7 [Play TTS test] Playing (none, C:\vg\system\tts7.wav)
153700.59 7 PlaySoundStart ok [C:\vg\system\tts7.wav]

153735.73 7 [Play TTS test] Playing
153735.73 7 tts generate start[...]
153736.63 7 tts generate end
153736.63 7 [Play TTS test] Playing (none, C:\vg\system\tts7.wav)
153736.66 7 PlaySoundStart ok [C:\vg\system\tts7.wav]


The created sound file tts7.wav can be downloaded from: http://www.voiceguid...upport/tts7.wav

attached is the screenshot showing the Performance Monitor trace of all calls.
For the first 3 calls the update speed was set to High, and for the last two calls the update speed was set to Normal - hence the difference in how the CPU usage spikes are shown.

[21 Sept 2006 : Updated link to WAV file]
  • TTStests.gif

Edited by SupportTeam, 21 September 2006 - 12:24 PM.


#8 Patrick Hurrelmann 10 July 2003 - 10:09 PM

we are using voiceguide and at&t natural voices (german klara 8khz and 16khz and english 8khz)

voice quality is briliant, though i can't compare it to competitors.

our ivr-app runs on a dual athlon mp 1600+ with 1gb of ram (dialogic d4epci and 4port active isdn card for fax application) ... and it is damn fast ;)

i'm using tts only on parts of the script, where data retrived of a local mssql has to be spoken. all other voice files are recorder with the same language and microsofts ttsapp. but i encountered that microsofts vb tts application does a better sound quality... (this may be an illusion).

#9 SupportTeam 11 July 2003 - 02:12 PM

VoiceGuide will create the TTS sound files in the Dialogic preferred format (11kHz 8bit Mono).

If Microsoft's sample app creates sound files with a higher frequency (say 16kHz) then when playing both sound files on the PC speaker the higher frequency sound file will sound better.

If Microsoft's sample app creates sound files in the same format (11kHz 8bit Mono) then both the sound files should sound the same - as the same function in the SAPI engine is used to create the sound file.

If Microsoft's sample app creates sound files with a higher frequency (say 16kHz) and that file is then played with VoiceGuide to the caller over a Dialogic card - the sound file will first be converted on-the-fly to the Dialogic preferred format (11kHz 8bit Mono) first - and due to this format conversion it may sound a bit different then a TTS sound file which was created in 11kHz 8bit Mono format in the first place...

(On-the-fly format conversion works only format converts have been installed - if they have not then the play will fail - to not rely on format conversion to play sound files we recommend only using sound files in the format preferred by the platform used - PCM 11kHz 8bit Mono for Dialogic and PCM 8kHz 16bit Mono for voice modems)

#10 Patrick Hurrelmann 11 July 2003 - 05:40 PM

in microsoft sample tts app you can configure every bitrate and hz or khz as you like...

#11 SupportTeam 11 July 2003 - 07:57 PM

OK, found the sample VB TTS app supplied when SAPI 5.1 is installed - and proceeded to use this app to create a sound file which says the same 27 second text - here is the link to the file created with the sample VB TTS app:

http://www.voiceguid...bTtsAppTest.wav

I played both files and they sound identical - and a quick look at them in the sound editor software suggests that they are indeed identical...

We're pretty sure VoiceGuide will generate the same sound files as the demo app - it uses the same function calls to the TTS engine that the sample app uses.

[21 Sept 2006 : updated link to WAV file]

Edited by SupportTeam, 21 September 2006 - 12:20 PM.


#12 Guest_TTS or PreRecord Using "Tal_* 02 March 2004 - 10:53 PM

TTS gives much more flexibility, especially when starting out and you're not sure what final application layout you'll achieve nor all the twists and turns your scripts will take in getting there. If you choose pre-recording using voice talents you have to be absolutely sure of your exact scripting needs and it is very, very expensive. Avoid it if you're just starting out with IVR and learning "on the job" as we are.

The Microsoft voices are unusable in a UK environment and possibly anywhere else in the world where an English language based app is being developed.

Rhetorical's product is undoubtedly excellent, their support is very good but it is expensive with licences being sold per simultaneous call capability (if you can take up to 4 calls, you'll need 4 licences and so on) plus the run time stuff itself which is licensed per developer seat. If you've got a Dialogic 4PCI/Euro you're looking at start-up costs of about 3,600 (including first year's support) and running costs for annual support of approx 1,600.

AT&Ts voices and engines are very good in a UK environment with Audrey being a slightly better choice than Charles who speaks a bit like Americans think the British do. Sadly the formats are wrong with both voices being in 16KHz format; modems need 8Khz and the Dialogic 11.250Khz.

On a modem Audrey gets "downsampled" and sounds terrible. I haven't tried her on the Dialogic kit yet but downsampling will again apply albeit at a slightly closer rate.

We bought our AT&T stuff through NextUp (NextUp.com) for about 100 (AT&T Engine plus 4 voices; Mike, Crystal - free with engine - and Charles and Audrey 20 each) and the company has provided excellent support. Their demos are superb with the only disapointment being the subsequent inability to reproduce Audrey over a modem or Dialogic card the way she sounds on their demo which is simply brilliant. Don't be fooled!

Until the voice formats get sorted out - both Microsoft and Dialogic choosing different, low spec formats is a real nuisance - text to speech will be the poor relation to pre-recorded prompts in terms of quality but the flexibility you get - and the money saved - cannot be overlooked.

#13 SupportTeam 03 March 2004 - 06:02 AM

QUOTE
On a modem Audrey gets "downsampled" and sounds terrible. I haven't tried her on the Dialogic kit yet

Many voice modems have very poor sound quality - I'd recommend trying using a Dialogic card..

In one of the posts above there is a link to the WAV file which was generated using AT&T Mike voice for use with Dialogic - so it is already downsampled to 11kHz/8bit.

http://www.voiceguid...upport/tts7.wav

I'd recommend listening to this file through a Dialogic card - most people find the quality of AT&T quite acceptable when used on Dialogic cards.

[21 Sept 2006 : Updated link to WAV file]

Edited by SupportTeam, 21 September 2006 - 12:21 PM.


#14 Guest_Digital Comfort_* 22 July 2004 - 03:08 PM

I visited AT&T's Natural Voices Site and it shows me 3 products but nowhere does it show me where to purchase Natural Voices engine. Just other software that uses their technology. Where do I go to purchase the $35.00 version?

Erich
Digital Comfort

#15 Guest_Digital Comfort_* 22 July 2004 - 03:18 PM

Sorry I found it... Further up on the posts. Thank you anyway.

Erich
Digital Comfort

#16 amolinero 23 July 2004 - 03:05 AM

Hi everybody.
Anybody knows where i can find a cheap spanish castillian tts voice?
I have tried Loquendo, Verbio, Rethorical, Elan speech, scan soft...but the chepeast license is not less than 500 euros.
It is sad to me see ATT license price...35

thanks in advance...

#17 Guest_Roy Jensen_* 22 August 2004 - 05:40 AM

I just tried the following text using both the ATT and Rhetorical TTS websites. Rhetorical was by-far the best in terms of speed and clarity. Furthermore, Rhetorical attempted (successfully) words not in its vocabulary, specifically myAVON. ATT spelt it out.

"Hello, this is Roy Jensen with myAVON.ca. I am calling to inform you of the new features available to Avon Representatives and their customers at www.myAVON.ca."

my 0.02

SMILE!
Roy Jensen

#18 Guest_Geoff_* 20 June 2005 - 01:36 PM

None of the sample sound files can be found on the server at those links. Are they still available for audition?

Thanks
Geoff

#19 SupportTeam 20 June 2005 - 03:41 PM

Looks like the sample sound files referred to in previous posts have since been deleted. Which one do you require?

[21 Sept 2006 : Links to WAV files have now been updated]

Edited by SupportTeam, 21 September 2006 - 12:31 PM.


#20 meiotic 10 April 2006 - 12:38 PM

Price for voices. I puchased Audrey from AT&T for $35+$35(1 voice and base package) but it wont install on a server only a desktop. Where should I go to get a server based package or am I installing it wrong?
Where every call counts
www.theycalled.com

#21 SupportTeam 10 April 2006 - 01:25 PM

You should speak about this to the provider/reseller of the purchased TTS engine.

#22 SupportTeam 21 September 2006 - 04:05 PM

Here is an 8kHz AT&T sample, which can be used in VoiceGuide for Dialogic to hear what the AT&T Natural Voices TTS sounds like over the phone line.

TTS was generated form this website: http://www.research....eb/tts/demo.php

And in the .ZIP both the original generated file is included as well as the 8kHz version which we created using Audacity - we just saved file in 8kHz/8bit format instead.

Voice used was "Crystal - US English" and the TTS text input was:

Welcome to Katalina Technologies, Please press 1 for the sales department, 2 for accounts.
If you have a technical support questions please see the knowledge base section on our web page,
or post a question on our support forum which is regularly monitored by our support staff.
Alternatively you may also press 3 to leave us a voicemail message, but please be sure to include your
email address in the message as any pre-sales support questions are answered by email only.


The website based demo generated WAV goes silent after about 20 seconds - this is the limitation of the website based demo generator itself.