It seems that it has changed over the years because right now there is no option to train a model with your own voice. I have found out about the ability of GAN for creating voice lines that was done by Lyrebird. Twilight Sparkle will say “Today is a great day” with a sad tone, which implies sarcasm in the way it is told. I decided to try out this feature and it actually works quite well, definitely more than what I expected. Intonation & EmotionĪnother interesting thing from 15.ai is how it uses DeepMoji to predict the emotion of a sentence.Ĭurrently, we could not manually set the emotion of the voice, as the only available choice for emotion is “Contextual” which uses DeepMoji.
I think this is due to a lot of the training data comes from MLP characters which have these characteristics in general. The author also noted that due to technical reasons (approximately uniformly distributed vocal frequencies), high-pitched/feminine voices work best. MLP characters such as Twilight Sparkle and Fluttershy that have upwards of 120 minutes training data sounds way, way better and more natural compared to SpongeBob with only 27 minutes of training data.Ĭharacters with large training data produce more natural dialogues with clearer inflections and pauses between words, especially for longer sentences. However, the difference in quality between characters with small and large training data is still quite apparent. Screenshot of MLP Character in 15.ai by Author I was going to leave it at the creator must be a huge fan of MLP, but I found this tweet under the FAQ titled “Why are there so many MLP voices?” One unexpected lineups that I saw is from My Little Pony which has over 40+ characters to be selected from.
UNDERTALE FAKE SCREENSHOT MAKER PLUS
Team Fortress 2 has all 8 characters, minus Pyro who always sounds muffled but plus the Administrator and Miss Pauling from the TF2 clips.
The Tenth Doctor from Doctor Who is also available. We have SpongeBob himself from SpongeBob SquarePants. Charactersīy the time of writing, the character’s voice lineup is pretty wide but limited.
However, this might be caused by the queue of requests sent by other users. The one part I could not agree on is the “faster-than-real-time” claim, because the voice generation is quite slow and not real-time. That is the description of the website by the author. Natural emotive high-quality faster-than-real-time text-to-speech synthesis with minimal data
UNDERTALE FAKE SCREENSHOT MAKER CODE
The author has not revealed any source code or published any paper for now, but he/she claims that this new technique beats SV2TTS in data efficiency and naturalness. The project is created as part of MIT’s Undergraduate Research Opportunities Program, but nobody knows for sure who is behind it for now.