Billed on number of characters Polly processes. So as soon as you upload it and Polly processes you are charged. In regards to pricing around the downloading AFAIK that would just be standard data transfer charges and probably very little in your case compared to the cost of Polly processing https://aws.amazon.com/polly/pricing/
here is the cost breakdown as per the Google Assistant install instructions. Nothing major from what I can see.
A breakdown of estimated charges based upon 1000 uses per month are below. These figures are taken from https://aws.amazon.com/s3/pricing/ and https://aws.amazon.com/polly/pricing/ as of 4th June 2017.
Assuming your S3 bucket is in the US East (Northern Virginia) Region, the S3 fees based upon 1000 requests to the skill are calculated as follows:
2000 PUT requests per month: 2000 x $0.005/1000 = $ 0.01
2000 GET requests: 2000 x $0.005/1000 = $ 0.01
0.1GB of Storage per month: 0.1 x $0.023 = $ 0.01
Annual cost = $0.36
1000 requests of 100 characters(100,000 characters) per month:
100000 x $0.000004 = $ 0.40
Annual cost = $4.80 per year
If you’re interested in this, check out AWS Polly. This feature is available to voice developers, it’s only a matter if time until Amazon has it in place in their 1P products.
How about you apply some of that cloud practitioner knowledge and check out Amazon's text to speech service: https://aws.amazon.com/polly/
You get 5 million characters for free, I assume that will be more than enough for you needs, but PM me if not as I have free credits that I'd be happy to put to use.
I really like this concept, but the TTS in Firefox on Linux is absolutely horrible. How about using something other than what's built into the browser/operating system?
Here is a small video just to show how bad the TTS is, along with a small demonstration of using Amazon Polly instead (I started working on a userscript in order to replace the TTS on my own) https://i.imgur.com/7W5rDfQ.mp4
You and /u/HyperGiant should check out Amazon Polly https://aws.amazon.com/polly/
It's an Amazon Web Service for providing text to speech files. Through the developer console you can test it out for free and that may be all you need.
I haven't seen any data claiming that software can't run on most devices.
Even Polly's FAQ says:
>On-device text-to-speech solutions require significant computing resources, notably CPU power, RAM, and disk space to be available on the device. This can result in higher development cost and higher power consumption on devices such as tablets, smartphones, etc. In contrast, text-to-speech conversion done in the cloud dramatically reduces local resource requirements. This makes it possible to support all of the available languages and voices at the highest possible quality. Moreover, speech corrections and enhancements are instantly available to all end-users and do not require additional updates for all devices. Cloud-based text-to-speech (TTS) is platform independent, so it minimizes development time and effort.
https://aws.amazon.com/polly/faqs/
They don't say it is impossible, just that it would hurt battery life. Just give people an option to increase battery life in that case, like those web accelerators that compress images to stream data faster to your device.
This is a fascinating area of research. On this website you will see a graphical comparison of listener ratings of the various TTS platforms out there. What is interesting from this is that they included some actual human readings into the mix for comparison and there were some TTS platforms that listeners preferred to the real human voices!
From their results, Amazon Polly Neural Net and Mozilla Judy Wave1 came out the best and my own research confirms this. However, this evaluation took no account of whether the voice was free / paid / subscribed etc - since this is variable. It turns out both of the top choices require the purchase of a licence for professional use.
The good news however, is that you can use Amazon Polly for free provided you dont "abuse" the system. This means that you can upload text to their platform and after it has been processed it will download the result as an audio file. If the file is too big, or you submit too many requests in a given period, the system will "block" you for a short time and then let you try again. On his basis you can process 5 million characters per month.
So, how do you use these platforms?
I am currently converting a load of e-books to audiobooks and have found the easiest way is to use Balabolka. This is a free tool that allows you to edit and prepare text for conversion and then gives you the option of using voices that you currently have stored locally on your own system (such as the Microsoft default voices) or submit the file to an online neural net platform such as Amazon Polly (there are other neural networks you can try).
So, after a couple of years hunting for the best solution, this is what I have settled on. If you pace the rate of your requests to Amazon Polly then you can do about 3 or 4 books a day.
Listened to a number of the Watson examples in English and they sound very robotic even compared to google's text-to-speech. When compared to offerings like amazon polly neural, they're not even an honorable mention.
https://aws.amazon.com/polly/ (see the neural labeled units)
I believe I've found the the true source of "Joanna":
There are TTS offerings from IBM Watson, Google Cloud, and AWS that can return audio, including in other languages, that would be repeatable, provided the 'voices' selected are not changed and are not deprecated.
If you are extremely picky regarding the voices currently such voices have technical limitations that makes them prohibitively expensive. They are generated server side and streamed through the internet. As such Speechify may not earn even a dime on you if you use it really extensively. To my best knowledge Speechify is based on Amazon Poly and it officially costs 16$ for 23 hours of generated voice: https://aws.amazon.com/polly/pricing/
Obviously there are other apps that lease those voices but you can expect similar prices simply because they cost that much.
The next lower step in quality would be to buy voices for the device that you have from specialized companies. CereProc voices for Android, Windows and Mac are very good (iOS doesn't support that at the moment). But those are just voices and you need some app to actually use them. On all those platforms you may use my app Speech Central as their companion. It surely has miles more of features than Speechify.
Finally if you are low on the budget you should check if you are fine with the voices that come on your device. Google now uses its own streaming voices on Android which are surprisingly good. Samsung Android devices come with Samsung voices which are also very good. On the macOS and iOS you can install enhanced quality voices from the system settings app that are also very good. All those voices are compatible with my app Speech Central and you may try some other apps too, but I am quite confident that it delivers more value than any other app.
If you use iOS and you can't use Siri voices as you mention, it is true that those voices are locked by Apple and they can't be used by any other app though they do have exceptional high quality. In that case you may want to support this petition to Apple to unlock those voices in a safe way: https://www.change.org/p/apple-apple-please-allow-3rd-party-apps-to-use-siri-voices-for-improved-accessibility
I like it, here are a few suggestions-
• Like Nokenito mentioned, considering making the eyes smaller. I think it is making them too 'cute' for the subject matter. And mixing the media for the background from illustrated to photography is a little jarring. Consider using just one type.
• Consider using Amazon Polly as it sounds more like natural speech when you use the Neural voices. The text to speech you used does not have inflection and just kind of drones on after a while (and sounds like text to speech). Here is an example - https://d1.awsstatic.com/product-marketing/Polly/voices/Joanna%20Neural%20-%20Polly%20main%20page.28a3470e5ca050e7c40bdda756c73900f41442e6.wav
and here is a link to the Polly page - https://aws.amazon.com/polly/ .
It's free as long as you go under 5 million characters a month.
• There is an error with your lower 3rd/sound effect at 6:10.
I hope that helps!
Yes this is the AWS AI based TTS called [Amazon Polly](https://aws.amazon.com/polly/#:~:text=Polly's%20Text%2Dto%2DSpeech%20(,synthesize%20natural%20sounding%20human%20speech.&text=In%20addition%20to%20Standard%20TTS,a%20new%20machine%20learning%20approach.). You can sign up to AWS for free and give it a go yourself
I’m 99% sure that AWS doesn’t claim copyright on anything you produce with Polly.
Here is the pricing. I’ve never heard anyone say that Polly pricing is going to break their budget.
But two disclaimers:
I use TTS to generate sound for sentences when i learn foreign languages. I do this with AwesomeTTS addon in ANKI. Dutch and even my native language which is polish is way better than Google Wavenet. At least for my ears.
Here you can compare it to Wavenet : https://cloud.google.com/text-to-speech
Azure and Wavenet use AI for TTS and are paid services. I have no clue how to use Azure TTS in your phone for now. You have to figured it out your self.
Another solution is to find Ivona TTS android apk, its is also good solution for OFFLINE use. Not so good as AI TTS. Ivona TTS is for many years amazon polly now. https://aws.amazon.com/polly/
I imagine it's probably a voice from AWS' Polly. From there it's probably Brian? On this page have a play of "Speaking Book Male Voice Example"
https://autoneasy.com/amazon-polly/12-voice-examples-of-amazon-polly/
You can use AWS Polly yourself but there'll be a learning curve https://aws.amazon.com/polly/
I was looking for a TTS before as well.
The ones that sound natural, like IBM Watson TTS, don't allow commercial use.
If you're releasing your game for free, maybe you could use a service like that.
But if you're selling the game, you'll either have to pay for a good TTS service like Amazon Polly or settle for the robotic voices, unfortunately.
Nice find. TTSMP3 appears to use Amazon Polly as its speech engine, which sounds more natural than many of the alternatives out there. I've also been impressed with the quality of Microsoft's Azure Cognitive Services, though most features require paid developer access.
It depends. Have you used an AWS application before? You need an account and to read through the documentation a bit. Depending on how savvy you are you could probably be up and running in an hour or so.
Isn’t this just Amazon Web Services Polly?
I seem to recall doing a short tutorial as part of a Cloud solution architect course with ACloudGuru? What makes yours different from the webpage I could create in 30mins with a little bit of copy/paste python?
Disclaimer: this answer is kind of a joke answer (i.e., not open source per your request, but it's free at your usage level for sure), but still kinda fun.
You can use AWS's "Polly" service (https://aws.amazon.com/polly/) to generate text to speech with a bunch of different voices. Here are some examples of how you could do it if you happen to have the aws cli installed.
aws polly synthesize-speech --voice-id Joanna --text "Bleep" --output-format mp3 bleep-joanna.mp3
aws polly synthesize-speech --voice-id Joanna --text "Bloop" --output-format mp3 bloop-joanna.mp3
aws polly synthesize-speech --voice-id Joanna --text "Zap" --output-format mp3 zap-joanna.mp3
aws polly synthesize-speech --voice-id Salli --text "Bleep" --output-format mp3 bleep-salli.mp3
aws polly synthesize-speech --voice-id Salli --text "Bloop" --output-format mp3 bloop-salli.mp3
aws polly synthesize-speech --voice-id Salli --text "Zap" --output-format mp3 zap-salli.mp3
I listened to all these and they would certainly be annoying if you get a lot of mail, but maybe that's what you're after!?
This video was sourced from a surreal dystopian document-- okay, it's an infomercial for Amazon Polly. Unlike this comment, the audio is 100% unedited. It's not exactly high energy (first time video editing woo!), but that's because you didn't rehydrate at your Corporate-approved time, leech
I have a power shell script that uses AWS Polly to generate the spoken words using the ipa information but it still sounds like crap half the time if you dont tweak individual sounds..
it costs about .10 for around 3 languages @ 4k words to generate the sounds files.
Awesome! Thanks, everyone, I'll look into Volumio--it seems like a great solution.
u/jquagga - you might want to look into the Node-RED Amazon Polly integration. It will give you really good TTS and has several voice options; in our case, we use a male voice with a British accent (Brian). You can add node-red-contrib-polly-tts
; it will return a link to an MP3 that can be played directly by the media player. Here's a link to Amazon's docs on it: Polly
It's that plus she wants it to sound a certain way (at least I assume so), so other engine that will sound different but is cheaper is not an option
Amazon have much (massively...) cheaper pricing and allow for commercial use but would sound different
I haven't built anything specifically but I've had some fun with Naver's Papago APIs: https://developers.naver.com/docs/papago/ and Amazon's Polly API: https://aws.amazon.com/polly/
I only played with the APIs via CURL; I didn't try building anything, but you could do some cool stuff with them.
Yes, please upload as needed. I will look into uploading archive.org .
I read the advertising for the service sutta central uses . It is licence free once you process it. This is usually rare for voice companies.
see this quote at the link below:
Amazon Polly allows for unlimited replays of generated speech without any additional fees. You can create speech files in standard formats like MP3 and OGG, and serve them from the cloud or locally with apps or devices for offline playback.
​
u/redparchel is correct. Amazon Polly component.
https://www.home-assistant.io/components/tts.amazon_polly/
Like I said delay is usually negligible, once in awhile two or three seconds. Out of the country on mobile or I'd post some of my code for you.
If you get into using the markup tags you can create very natural sounding voice responses, including breath sounds, emphasis etc.
If you want to go for Text to Speech for whatever reason try Amazon Polly they are free for the first 12 months and for up to 5 million characters/month but you are going to put in your credit card information before you start using it. It's Amazon so you can trust them with that.
​
For voices preview Go to this page scroll down and Check out the following voices
US English Joanna
US English Justin
US English Salli
UK English Amy
​
They are the best ones in my opinion.
First of all. Big props to you for doing what you are doing. Sound is everything in game and what you are doing is amazing. I do require a link to your twitch channel though...
Second. there are plenty of text-to-speech software out there with natural voices but of cause the really good ones are not free.
The 1st comment on this thread has a bunch of places you could check out. https://www.quora.com/What-is-the-best-text-to-speech-software
Also try Amazon Polly, you could get someone to help you set it up since it is an API that can be integrated into anything really or used by its self (i think).
Also try and find something for your smartphone (if you have one) and connect it to your pc as an audio device. That way you wouldn't have to type on your pc out of game when replying. Auto correct might be funny too
Downloading some random files off some random website would be absolutely useless for training data for an AI, it would be better to generate your own files based on a better corpus of text, and you can vary all the variables like voices, speed etc for every text.
Looking at the html source - they use IVONA now Amazon Polly, so why not go and generate as much training data as you want using: https://aws.amazon.com/polly/
Oh yes, breaking news would be a great addition. The current GalNet news did get nearly useless after they've added the automatic(?) powerplay reports. It takes too much time to find the interesting content. :(
Maybe audio would be possible if using a speech synthesis engine if they can't get speaker. We are already used to the voices of the flight control of the Engineer bases. ;) But as far as i remember FDev is already using services from Amazon. So they could also use Amazon Polly to create some awesome audio speech content.
Back to the encyclopedia:
The encyclopedia might be easy to implement when it's based on a real wiki which is rendered in the game. Well, stuff could be also written by players who'd like to contribute and are registered as writer at FDev. They could be rewarded by ingame extras like special skins or decals (annalist, reporter, ..).
Maybe they could also import the great Elite Dangerous Wiki into it if the creators allow it.