you can't use the actual mac tts on windows but with a bit of searching i found an alternative that creates pretty much the same output, it's just a little bit more difficult to operate
here's this open source formant speech synthesizer called eSpeak that you can install on windows
once it's installed you use cmd to cd into the program directory (in program files by default) and then into the command_line folder, and from there you can do espeak "hello world"
to make it say whatever you want
for the whisper voice you would want to do espeak -ven-us+whisper "utopia"
and replace utopia with whatever you want to say
-ven-us
sets the voice to en-us, and the +whisper
sets the variant of the voice to whisper
you can also add things like -w filename.wav
that exports the speech to a .wav rather than speaking it, or alter the speed and pitch of the voice, etc. there's more information on the commands in the documentation
it seems a bit long-winded but it works just as well, i hope that helps!
The pronunciation for some languages (Esperanto, Armenian, Catalan, Romanian, Albanian, Icelandic) sounds like it is generated by eSpeak, so it probably didn't take a lot of effort to implement.
It is an enormous undertaking to get that kind of converter resource set up. Dictionaries would be your best solution. Failing that, the next best solution is probably to use a text-to-speech system and extract its phonetic representation, which is based on grapheme to phoneme rules. It's not going to be 100% accurate, though.
With eSpeak, for example, you could make a query like
espeak -q --ipa -v "en-us" "Here is some text to transcribe"
which will give you /hˈɪɹ ɪz sˌʌm tˈɛkst tə tɹænskɹˈaɪb/. The trick is that it's going to be based on the voice that you give it with the -v
flag, which may introduce some oddities. For example, running
espeak -q --ipa -v "en-us" "cot caught"
produces /kˈɑːt kˈɔːt/, which is not reflective of general American English pronunciaiton.
I mention this so you're aware of your options and can decide for yourself if this is acceptable accuracy for your purposes. If you're interested, you can download eSpeak and run it in a terminal. Or, you can access it with Praat through its SpeechSynthesizer interface, which can write the transcription to one of its TextGrids.
EDIT: fixing typos and adding forgotten words
Nice post, but it's sorely missing examples! How does it turn out?
For generating rhyming words, I stole an idea from @cmyr, who runs a few amusing Twitter accounts (one finds tweets that are anagrams, another composes haikus, etc.): He used the espeak voice synthesizer to generate a list of phonemes from a word.
Here's my code, which is not a complete example for finding rhyming words, but invokes espeak to convert Greek words to phonemes and caches them for speed.
It's a bit Rube Goldberg but I use an Emporia Energy Monitor with a script on a Linux box that checks their semi-documented API via PyEmVue for over-time kwh usage for the washer and dryer. The script guesses when the machines are active and inactive via their energy use. It's pretty much "if kwh per 10 minutes > .25 then it's on else it's off" or whatever.
When it figures the machine was active and is now inactive that indicates the cycle is over. Then I have an Echo sitting next to the speaker of the Linux box and use espeak to have the Echo announce the laundry is done. The script basically talks to the Echo and says "Alexa, make an announcement" and then "The washing machine is done"
The API changes every now and then and so things have been breaking. Recently the minute interval electricity statistics seem to have stopped working so now I need to recalibrate the script for 15 minute intervals.
I'd rather access the energy monitor directly or perhaps better just have some sort induction coils that can track electric usage and interface directly to a Rasp Pi. But the contraption I created seemed to be the easiest.
I have no idea what's best but I use espeak because it's a small C program that doesn't have many dependences. It also sounds better then festival
#!/bin/bash xsel | espeak
http://espeak.sourceforge.net/
I use eSpeak by TTSAPP (I just call the program TTS App since that's how it shows up) for narration when I can't be bothered to/when I can't get my voice sounding how I want to. You can output any text you input to a .wav sound file. Works pretty well for me, has a few different voice options, not sure if you can get more because I'm lazy and haven't researched it much.
As for the toast always burning, not a damn clue. Some of my friends have the issue, and it has come up for me a few times. For the video, I cut a burger bun to try and look like bread since I didn't have any sliced loafs of bread around and then proceeded to max out the toaster settings and put it in. A minute or two of filming later, I smelled horrid burning and manually ejected the toast, giving me the disgusting burnt thing I proceeded to put in my mouth and chew on.
I'm glad you enjoyed it! I hope you get some use out of whatever ramblings I put here!
The default text-to-speech reader "Narrator" available in windows is useful if you need it to read the entire document, and all of the menu options, drop down menu, etc... However, it is not an exact replacement for the Mac's 'say' command.
There are a few options available, including this open source called eSpeak, which may work for what you want. Jampal, another text-to-speech, may also work for you. Quick note though, I haven't tested either of these options
you mean a 'text to speech' program that can output to a file?
https://opensource.com/article/18/12/linux-toy-espeak
http://espeak.sourceforge.net/
> Can produce speech output as a WAV file.
I've checked the range of supported languages for Dolphin Supernova and they don't support Afrikaans,
https://yourdolphin.com/product/languages?pid=4
But this website https://www.webbie.org.uk/links.htm
mentions
eSpeak SAPI5 voices
> These free voices can be used with the Thunder or NVDA screenreaders. Includes Afrikaans, Bosnian, Czech, Greek, Esperanto, Finnish, Croatian, Hungarian, Kurdish, Romanian, Slovak, Serbian, Swedish, Swahili, Tamil, Turkish.
However on the eSpeak sourceforge page it carries the caveats
> Languages. The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. Please contact me if you want to help.
>eSpeak does text to speech synthesis for the following languages, some better than others.
say
comes with macOS, and epseak is available for Linux and Windows. It sounds like either will do what you want; you'll just have to press enter after each word/when you want it to read.
Mac
Mac has a say command
say "Hi, how are you?"
The command has several options. E.g. to use a specific voice
say -v Alex "Hi, how are you?"
To save the output to an aiff file (you can convert it to mp3 or other formats using various tools):
say "Hi, how are you?" -o hello.aiff
To read the text from a text file (test.txt), and save it to an audio file:
say -f test.txt -o hello.aiff
Linux, Windows, and MacOSX
Use espeak. E.g.
echo "Hi how are you" | ./speak
I downloaded a binary named speak for Mac. If you are using Linux, it may already be installed on your machine under the name "espeak"
echo "Hi how are you" | espeak
I'd add Artha and eSpeak to that list. Artha is a fantastic English thesaurus with hotkey support for quick access in other programs. eSpeak is great for having your writing read back to you.
> If someone could Snap Artha that'd be cool. Most distributions have it in their repositories but some wont package it because it hasn't been updated in awhile.
You're right. I didn't click on the play button.
Apparently the backend of that website is espeak. I recognize that robotic voice of espeak because I've used it in one of my projects.
"hora de laba" acum și eu unul. Scara aia interioara e freaky rau dar și eu ma uit la desene animate deochiate! later edit: sau asta <- FREE. meh, mai bine acum? nimeni nu știa, decat proprietarul bancii de unde alege?
How funky are these sounds exactly? If there's another bigger language that has similar phonology, or if you want a (albeit somewhat robotic-sounding) speech synthesizer, you might want to try eSpeak (http://espeak.sourceforge.net/add_language.html).
It's fairly easy to learn--for a project as an undergrad I tweaked a version of classical Latin into Ecclesiastical Latin in a few weeks.
Espeak has several voice options including "whisper". I doubt it will be the same as the Mac voice, but since it's free, I suppose you've nothing to lose by trying it.
Text-to-speech engines and their file formats are far from universal.
Here are instructions on how to add a voice to eSpeak, which has an Android port. The other major TTS engines on Android -- Google TTS, Ivona, SVOX, Loquendo -- use their own proprietary formats and do not provide instructions for creating your own voices.
If you wanted to create your own TTS engine, you could take a look at eSpeak for Android and the TextToSpeechService documentation. The TTS APIs changed significantly in API 14, so you probably won't want to try and target anything lower than that. The Eyes-Free documentation referenced elsewhere in this post targets pre-API 14 and is very out-of-date.
Don't know if this will help - and sadly I don't have the time to look into it myself but I found 2 possible links to explore if someone cares to look into it:
Lexconverter - apparently this will accept IPA transcriptions.
Espeak - apparently the data from Lexconverter can be fed into Espeak.
I was looking at Jasper and was thinking about using my Pi as a voice-controlled device like that, but Jasper's Text-To-Speech sucks. This one sounds much beetter and clearer, does anyone know if it's recorded audio or an actual TTS? Which one?
Edit: I just looked at the dependencies in the comments of the YouTube video (seriously, who puts that kind of stuff on youtube with pastebinned code?) and it's Festival, while Jasper uses ESpeak
I'm lazy and I use ultrahal and in installed espeak(on sourceforge) and so I use ultra hal to read things when I copy it to the clip board. The espeak application doesn't do that, you have to paste it into the app and press the "read" button. But yeah, ultralhal (free version) reads everything on reddit to me in eSpeaks "EN-US" voice (There's one with some kind of UK accent).
(I actually do it because I'm a very slow reader - the lazy thing was a joke. I can have EN-US read it to me very quickly. I have it at speed 7 now. It used to be hard for me to understand even speed 3 and 4, hah, so now I blow through reddit.)
All of that is free. I don't know of any other free options, but I haven't looked into it much. I'm happy with my setup. This may not be good for people who are completely blind, I'm not sure.
Ĝi nomiĝas "eSpeak". http://espeak.sourceforge.net/
>Why hasn't a program been made which can "sound out" English with the proper accents and timing, like a person?
There are programs that do this. They're called formant synthesizers and they sound like this, which is why your GPS doesn't use them.
Ultimately, the limiting factor in speech synthesis is the complexity of the model you have to use. A system that uses just samples has no model, and therefore while it can sound relativley human-like, it can't "reason" about how to pronounce things that it doesn't have samples for. Systems like formant synthesis attempt to model the human vocal tract and pronunciation systems, which gives them theoretically the entire vocabulary of a human, but at the expense of many "mistakes" due to our incomplete understanding of these processes.