Revision as of 05:07, 4 August 2023 editHackerKnownAs (talk | contribs)478 edits Reverting large amounts of vandalism from IP usersTag: Undo← Previous edit | Latest revision as of 16:31, 10 January 2025 edit undoGregariousMadness (talk | contribs)Extended confirmed users1,326 editsNo edit summary | ||
(699 intermediate revisions by 61 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Real-time text-to-speech AI tool}} | |||
{{pp|small=yes}} | |||
{{Use mdy dates|date=January 2025}} | |||
{{Short description|Real-time text-to-speech tool using artificial intelligence}} | |||
{{Good article}} | |||
{{Use mdy dates|date=July 2022}} | |||
{{Infobox website | {{Infobox website | ||
| name = 15.ai | | name = 15.ai | ||
| logo_caption = {{Deletable file-caption|Thursday, 24 October 2024|F7}} | |||
| logo = File:15 ai logo transparent.png | |||
| screenshot = | | screenshot = | ||
| caption = | | caption = | ||
| founder = 15 | | founder = ] | ||
| commercial = No | | commercial = No | ||
| registration = None | | registration = None | ||
| launch_date = |
| launch_date = {{start date and age|2020|03}} | ||
| type = ], ], ] | |||
| current_status = Under maintenance | |||
| type = ], ], ], ] | |||
| website = {{URL|https://15.ai}} | | website = {{URL|https://15.ai}} | ||
| language = English | | language = English | ||
| current_status = Inactive | |||
}} | }} | ||
'''15.ai''' was a free non-commercial ] that used ] to generate ] voices of fictional characters from ].{{sfnm|遊戲|2021|Yoshiyuki|2021}} Created by an artificial intelligence researcher known as '''15''' during their time at the ], the application allowed users to make characters from ], ], and ] speak custom text with emotional inflections faster than real-time.{{efn|The term ''"faster than real-time"'' in speech synthesis means that the system can generate audio more quickly than the actual duration of the speech—for example, generating 10 seconds of speech in less than 10 seconds would be considered faster than real-time.}}{{sfnm|Kurosawa|2021|Ruppert|2021|Clayton|2021|Morton|2021|Temitope|2024}} The platform was notable for its ability to generate convincing voice output using minimal training data—the name "15.ai" referenced the creator's claim that a voice could be cloned with just 15 seconds of audio. It was an early example of an application of ] during the initial stages of the ]. | |||
{{Artificial intelligence}} | |||
'''15.ai''' is a ] ] ] ] that generates natural emotive high-fidelity{{efn|The phrase "high-fidelity" in TTS research is often used to describe ]s that are able to reconstruct waveforms with very little distortion, and is not simply synonymous with "high quality." See the papers for HiFi-GAN,<ref>{{cite arXiv |last=Kong |first=Jungil|eprint=2010.05646v2 |title=HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis|class=cs |date=2020 }}</ref> GAN-TTS,<ref>{{cite arXiv |last=Binkowski |first=Mikołaj|eprint=1909.11646v2 |title=High Fidelity Speech Synthesis with Adversarial Networks|class=cs |date=2019 }}</ref> and parallel ]<ref name="deepmind"/> for unbiased examples of this usage of terminology.}} ] voices from an assortment of fictional characters from a variety of media sources.<ref name="kotaku">{{cite web | |||
|url= https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835 | |||
|title= Website Lets You Make GLaDOS Say Whatever You Want | |||
|last= Zwiezen | |||
|first= Zack | |||
|date= 2021-01-18 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2021-01-18 | |||
|quote= | |||
|archive-date= 2021-01-17 | |||
|archive-url= https://web.archive.org/web/20210117164748/https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835 | |||
|url-status= live | |||
}}</ref><ref name="gameinformer">{{cite magazine | |||
|url= https://www.gameinformer.com/gamer-culture/2021/01/18/make-portals-glados-and-other-beloved-characters-say-the-weirdest-things | |||
|title= Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App | |||
|last= Ruppert | |||
|first= Liana | |||
|date= 2021-01-18 | |||
|magazine= ] | |||
|publisher= ] | |||
|access-date= 2021-01-18 | |||
|quote= | |||
|archive-date= 2021-01-18 | |||
|archive-url= https://web.archive.org/web/20210118175543/https://www.gameinformer.com/gamer-culture/2021/01/18/make-portals-glados-and-other-beloved-characters-say-the-weirdest-things | |||
|url-status= live | |||
}}</ref><ref name="pcgamer">{{cite web | |||
|url= https://www.pcgamer.com/make-the-cast-of-tf2-recite-old-memes-with-this-ai-text-to-speech-tool | |||
|title= Make the cast of TF2 recite old memes with this AI text-to-speech tool | |||
|last= Clayton | |||
|first= Natalie | |||
|date= 2021-01-19 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2021-01-19 | |||
|quote= | |||
|archive-date= 2021-01-19 | |||
|archive-url= https://web.archive.org/web/20210119133726/https://www.pcgamer.com/make-the-cast-of-tf2-recite-old-memes-with-this-ai-text-to-speech-tool/ | |||
|url-status= live | |||
}}</ref><ref name="rockpapershotgun">{{cite web | |||
|url= https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/ | |||
|title= Put words in game characters' mouths with this fascinating text to speech tool | |||
|last= Morton | |||
|first= Lauren | |||
|date= 2021-01-18 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2021-01-18 | |||
|quote= | |||
|archive-date= 2021-01-18 | |||
|archive-url= https://web.archive.org/web/20210118213308/https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/ | |||
|url-status= live | |||
}}</ref> Developed by an anonymous ] researcher under the eponymous ] '''15''', the project uses a combination of ] algorithms, ] ], and ] models to generate and serve emotive character voices faster than real-time, particularly those with a very small amount of ] data. | |||
Launched in March 2020,{{sfn|Ng|2020}} 15.ai gained widespread attention in early 2021 when it went ] on social media platforms like ] and ], and quickly became popular among Internet fandoms, including the '']'', '']'', and '']'' fandoms.{{sfnm|Zwiezen|2021|Chandraseta|2021|Temitope|2024}}{{sfn|GamerSky|2021}} The service distinguished itself through its support for emotional context in speech generation through ]s and precise pronunciation control through ]s. 15.ai is credited as the first mainstream platform to popularize AI voice cloning (]s) in ] and ].{{sfnm|Speechify|2024|1ref=Speechify-2024|Temitope|2024|Anirudh VK|2023|Wright|2023}} | |||
Launched in early 2020, 15.ai began as a ] of the ] of voice acting and dubbing using technology.<ref name="thebatch"> | |||
{{cite web | |||
|url= https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters | |||
|title= Voice Cloning for the Masses | |||
|last= Ng | |||
|first= Andrew | |||
|date= 2020-04-01 | |||
|website= The Batch | |||
|publisher= The Batch | |||
|access-date= 2020-04-05 | |||
|archive-url=https://web.archive.org/web/20200807111844/https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters | |||
|archive-date=2020-04-08 | |||
|url-status= dead | |||
|quote= }} | |||
</ref> Its gratis and non-commercial nature (with the only stipulation being that the project be properly credited when used), ease of use, no ] registration requirement, and substantial improvements to current text-to-speech implementations have been lauded by users;<ref name="gameinformer"/><ref name="towardds">{{cite web | |||
|url= https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6 | |||
|title= Generate Your Favourite Characters' Voice Lines using Machine Learning | |||
|last= Chandraseta | |||
|first= Rionaldi | |||
|date= 2021-01-19 | |||
|website= Towards Data Science | |||
|access-date= 2021-01-23 | |||
|quote= | |||
|archive-date= 2021-01-21 | |||
|archive-url= https://web.archive.org/web/20210121132456/https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6 | |||
|url-status= live | |||
}}</ref><ref name="kotaku" /><ref name="pcgamer" /> however, some critics and ]s have questioned the ] and ] of leaving such technology publicly available and readily accessible.<ref name="thebatch"/><ref name="batch"/><ref name="wccftech"/> | |||
15.ai's approach to data-efficient voice synthesis and emotional expression was influential in subsequent developments in AI text-to-speech technology. In January 2022, Voiceverse NFT sparked controversy when it was discovered that the company, which had partnered with voice actor ], had misappropriated 15.ai's work for their own platform. The service was ultimately taken offline in September 2022. Its shutdown led to the emergence of various commercial alternatives in subsequent years. | |||
Credited as the impetus behind the popularization of AI ] (also known as '']'') in ] and as the first publicly available AI vocal synthesis project to involve the use of existing popular fictional characters, 15.ai has had a significant impact on multiple Internet ]s, most notably the ], '']'', and '']'' fandoms. Furthermore, 15.ai has inspired the use of ]'s '''Pony Preservation Project''' in other ] projects.<ref name="automaton"/><ref name="Denfaminicogamer"/> | |||
== History == | |||
Several commercial alternatives have spawned with the rising popularity of 15.ai, leading to cases of misattribution and theft. In January 2022, it was discovered that '''Voiceverse NFT''', a company that voice actor ] announced his partnership with, had ] 15.ai's work as part of their platform.<ref name="nme">{{cite web | |||
=== Background === | |||
|url= https://www.nme.com/news/gaming-news/voiceverse-nft-admits-to-taking-voice-lines-from-non-commercial-service-3140663 | |||
{{Broader|Deep learning speech synthesis}} | |||
|title= Voiceverse NFT admits to taking voice lines from non-commercial service | |||
]s) between Tacotron and a modified variant of Tacotron]] | |||
|last= Williams | |||
The field of artificial ] underwent a significant transformation with the introduction of ] approaches.{{sfn|Barakat|2024}} In 2016, ]'s publication of the seminal paper '']: A Generative Model for Raw Audio'' marked a pivotal shift toward ]-based speech synthesis, demonstrating unprecedented audio quality through ] ] operating directly on raw audio waveforms at 16,000 samples per second, modeling the ] of each audio sample given all previous ones. Previously, ]—which worked by stitching together pre-recorded segments of human speech—was the predominant method for generating artificial speech, but it often produced robotic-sounding results with noticeable artifacts at the segment boundaries.{{sfn|van den Oord|2016}} Two years later, this was followed by ]'s Tacotron in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron failed to produce intelligible speech.<ref name="Google">{{harvnb|Google|2018}}</ref> The same year saw the emergence of HiFi-GAN, a ] (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech,{{sfnm|Kong|2020}} followed by Glow-TTS, which introduced a ] approach that allowed for both fast inference and voice style transfer capabilities.{{sfnm|Kim|2020}} Chinese tech companies also made significant contributions to the field, with ] and ] developing proprietary text-to-speech frameworks that further advanced the state of the art, though specific technical details of their implementations remained largely undisclosed.{{sfn|Temitope|2024}} | |||
|first= Demi | |||
|date= 2022-01-18 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-01-18 | |||
|quote= | |||
|archive-date= 2022-01-18 | |||
|archive-url= https://web.archive.org/web/20220118162845/https://www.nme.com/news/gaming-news/voiceverse-nft-admits-to-taking-voice-lines-from-non-commercial-service-3140663 | |||
|url-status= live | |||
}}</ref><ref name="stevivor">{{cite web | |||
|url= https://stevivor.com/news/troy-baker-nft-voiceverse-15-ai/ | |||
|title= Troy Baker-backed NFT company admits to using content without permission | |||
|last= Wright | |||
|first= Steve | |||
|date= 2022-01-17 | |||
|website= Stevivor | |||
|access-date= 2022-01-17 | |||
|quote= | |||
|archive-date= 2022-01-17 | |||
|archive-url= https://web.archive.org/web/20220117231918/https://stevivor.com/news/troy-baker-nft-voiceverse-15-ai/ | |||
|url-status= live | |||
}}</ref><ref name="techtimes">{{cite web | |||
|url= https://www.techtimes.com/articles/270688/20220118/troy-bakers-partner-nft-company-voiceverse-reportedly-steals-voice-lines.htm | |||
|title= Troy Baker's Partner NFT Company Voiceverse Reportedly Steals Voice Lines From 15.ai | |||
|last= Henry | |||
|first= Joseph | |||
|date= 2022-01-18 | |||
|website= Tech Times | |||
|access-date= 2022-02-14 | |||
|quote= | |||
|archive-date= 2022-01-26 | |||
|archive-url= https://web.archive.org/web/20220126204741/https://www.techtimes.com/articles/270688/20220118/troy-bakers-partner-nft-company-voiceverse-reportedly-steals-voice-lines.htm | |||
|url-status= live | |||
}}</ref> | |||
=== Development, release, and operation === | |||
On September 8, 2022, 15.ai was temporarily taken down in preparation for an upcoming update, a year after its last stable release (v24.2.1). As of June 6, 2023, it is still temporarily offline.<ref>{{Cite web |title=15 on Twitter: "(I probably won't open Twitter again until I finally get this up and running.)" / Twitter |url=https://twitter.com/fifteenai/status/1628834756736479238 |access-date=2023-06-06 |website=Twitter |language=en}}</ref> | |||
{{Quote box | |||
|quote= The website has multiple purposes. It serves as a proof of concept of a platform that allows anyone to create content, even if they can't hire someone to voice their projects. | |||
It also demonstrates the progress of my research in a far more engaging manner - by being able to use the actual model, you can discover things about it that even I wasn't aware of (such as getting characters to make gasping noises or moans by placing commas in between certain phonemes). | |||
== Features == | |||
], known for his sinister robotic voice, is one of the available characters on 15.ai.<ref name="kotaku"/>]] | |||
Available characters include ] and ] from '']'', characters from '']'', ] and a number of ] from '']'', ] from '']'', ] and ] from '']'', the ] from '']'', ] from '']'', the Narrator from '']'', the ]/] ] Announcer (formerly), ] from '']'', ] from '']'', Dan from '']'', and ] from '']''.<ref name="Denfaminicogamer">{{cite web | |||
|url= https://news.denfaminicogamer.jp/news/210118f | |||
|title= 『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に | |||
|last= Yoshiyuki | |||
|first= Furushima | |||
|date= 2021-01-18 | |||
|website= Denfaminicogamer | |||
|access-date= 2021-01-18 | |||
|quote= | |||
|archive-date= 2021-01-18 | |||
|archive-url= https://web.archive.org/web/20210118051321/https://news.denfaminicogamer.jp/news/210118f | |||
|url-status= live | |||
}}</ref><ref name="automaton">{{cite web | |||
|url= https://automaton-media.com/articles/newsjp/20210119-149494/ | |||
|title= ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる | |||
|last= Kurosawa | |||
|first= Yuki | |||
|date= 2021-01-19 | |||
|website= AUTOMATON | |||
|publisher= AUTOMATON | |||
|access-date= 2021-01-19 | |||
|quote= | |||
|archive-date= 2021-01-19 | |||
|archive-url= https://web.archive.org/web/20210119103031/https://automaton-media.com/articles/newsjp/20210119-149494/ | |||
|url-status= live | |||
}}</ref><ref name="LaPS4">{{cite web | |||
|url= https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/ | |||
|title= Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras | |||
|last= Villalobos | |||
|first= José | |||
|date= 2021-01-18 | |||
|website= LaPS4 | |||
|publisher= LaPS4 | |||
|access-date= 2021-01-18 | |||
|quote= | |||
|archive-date= 2021-01-18 | |||
|archive-url= https://web.archive.org/web/20210118172043/https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/ | |||
|url-status= live | |||
}}</ref><ref name="yahoofin">{{cite web | |||
|url= https://es-us.finanzas.yahoo.com/noticias/15-ai-sitio-te-permite-152000712.html | |||
|title= 15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras | |||
|last= Moto | |||
|first= Eugenio | |||
|date= 2021-01-20 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2021-01-20 | |||
|quote= | |||
|archive-date= 2022-03-08 | |||
|archive-url= https://web.archive.org/web/20220308230836/https://es-us.finanzas.yahoo.com/noticias/15-ai-sitio-te-permite-152000712.html | |||
|url-status= live | |||
}}</ref> | |||
It also doesn't let me get away with picking and choosing the best results and showing off only the ones that work Being able to interact with the model with no filter allows the user to judge exactly how good the current work is at face value. | |||
The ] model used by the application is ]: each time that speech is generated from the same string of text, the intonation of the speech will be slightly different. The application also supports manually altering the ] of a generated line using ''emotional contextualizers'' (a term coined by this project), a sentence or phrase that conveys the emotion of the take that serves as a guide for the model during inference.<ref name="towardds" /><ref name="automaton"/><ref name="Denfaminicogamer"/> | |||
|author=15 | |||
Emotional contextualizers are representations of the emotional content of a sentence deduced via ] ] ] using DeepMoji, a deep neural network ] algorithm developed by the ] in 2017.<ref>{{cite book |last=Felbo |first=Bjarke |arxiv=1708.00524 |title=Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing|chapter=Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm |date=2017 |pages=1615–1625 |doi=10.18653/v1/D17-1169 |s2cid=2493033 }}</ref><ref>{{cite web | |||
|source=]<ref name="hn">{{harvnb|Hacker News|2022}}</ref> | |||
|url= https://www.theregister.com/2017/08/07/sarcasm_detector_bot_mit/ | |||
|align=right | |||
|title= A sarcasm detector bot? That sounds absolutely brilliant. Definitely | |||
|width=400px | |||
|last= Corfield | |||
|first= Gareth | |||
|date= 2017-08-07 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-02 | |||
|archive-date= 2022-06-02 | |||
|archive-url= https://web.archive.org/web/20220602215737/https://www.theregister.com/2017/08/07/sarcasm_detector_bot_mit/ | |||
|url-status= live | |||
}}</ref> DeepMoji was trained on 1.2 billion emoji occurrences in ] data from 2013 to 2017, and has been found to outperform human subjects in correctly identifying sarcasm in Tweets and other online modes of communication.<ref>{{cite web | |||
|url= https://www.technologyreview.com/2017/08/03/105566/an-algorithm-trained-on-emoji-knows-when-youre-being-sarcastic-on-twitter/ | |||
|title= An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter | |||
|last= | |||
|first= | |||
|date= 2017-08-03 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-02 | |||
|archive-date= 2022-06-02 | |||
|archive-url= https://web.archive.org/web/20220602215737/https://www.technologyreview.com/2017/08/03/105566/an-algorithm-trained-on-emoji-knows-when-youre-being-sarcastic-on-twitter/ | |||
|url-status= live | |||
}}</ref><ref>{{cite web | |||
|url= https://www.bbc.com/news/technology-40850171 | |||
|title= Emojis help software spot emotion and sarcasm | |||
|last= | |||
|first= | |||
|date= 2017-08-07 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-02 | |||
|archive-date= 2022-06-02 | |||
|archive-url= https://web.archive.org/web/20220602215735/https://www.bbc.com/news/technology-40850171 | |||
|url-status= live | |||
}}</ref><ref>{{cite web | |||
|url= https://www.newsweek.com/emoji-computer-sarcasm-emotion-training-hate-speech-647474 | |||
|title= Emoji-Filled Mean Tweets Help Scientists Create Sarcasm-Detecting Bot That Could Uncover Hate Speech | |||
|last= Lowe | |||
|first= Josh | |||
|date= 2017-08-07 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-02 | |||
|archive-date= 2022-06-02 | |||
|archive-url= https://web.archive.org/web/20220602215735/https://www.newsweek.com/emoji-computer-sarcasm-emotion-training-hate-speech-647474 | |||
|url-status= live | |||
}}</ref> | |||
15.ai uses a ''multi-speaker model''—hundreds of voices are trained concurrently rather than sequentially, decreasing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to such emotional context.<ref name="arxivmello">{{cite arXiv |last=Valle |first=Rafael |eprint=1910.11997 |title=Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens |class=eess |date=2020 }}</ref> Consequently, the entire lineup of characters in the application is powered by a single trained model, as opposed to multiple single-speaker models trained on different datasets.<ref name="arxivmulti">{{cite arXiv |last=Cooper |first=Erica |eprint=1910.10838 |title=Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings |class=eess |date=2020 }}</ref> The ] used by 15.ai has been scraped from a variety of Internet sources, including ], ], the ], ], ], and ]. Pronunciations of unfamiliar words are automatically deduced using ]s learned by the deep learning model.<ref name="automaton"/> | |||
The application supports a simplified version of a set of English phonetic transcriptions known as ] to correct mispronunciations or to account for ]—words that are spelled the same but are pronounced differently (such as the word ''read'', which can be pronounced as either {{IPAc-en|ˈ|r|ɛ|d}} or {{IPAc-en|ˈ|r|iː|d}} depending on its ]). While the original ARPABET codes developed in the 1970s by the ] supports 50 unique symbols to designate and differentiate between English phonemes,<ref name="klautau">{{cite web|last=Klautau|first=Aldebaro|year=2001|url=http://www.laps.ufpa.br/aldebaro/papers/ak_arpabet01.pdf|title=ARPABET and the TIMIT alphabet|access-date=September 8, 2017|archive-url=https://web.archive.org/web/20160603180727/http://www.laps.ufpa.br/aldebaro/papers/ak_arpabet01.pdf|archive-date=June 3, 2016}}</ref> the ]'s ARPABET convention (the set of transcription codes followed by 15.ai<ref name="automaton" />) reduces the symbol set to 39 phonemes by combining ] phonetic realizations into a single standard (e.g. <code>]</code>; <code>]/]</code>) and using multiple common symbols together to replace ] (e.g. <code>EN/AH0 N</code>).<ref name="columbia">{{cite web | |||
|url= http://www.cs.columbia.edu/~julia/courses/CS6998-2019/%5B07%5D%20Phonetics.pdf | |||
|title= Phonetics | |||
|last= | |||
|first= | |||
|date= 2017 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-11 | |||
|url-status= live | |||
|archive-date= 2022-06-19 | |||
|archive-url= https://web.archive.org/web/20220619180213/http://www.cs.columbia.edu/~julia/courses/CS6998-2019/%5B07%5D%20Phonetics.pdf | |||
}}</ref><ref name="prondicts">{{cite thesis | |||
|type=MSc | |||
|last=Loots | |||
|first=Linsen | |||
|date=March 2010 | |||
|title=Data-Driven Augmentation of Pronunciation Dictionaries | |||
|publisher=Stellenbosch University, Department of Electrical & Electronic Engineering | |||
|citeseerx=10.1.1.832.2872 | |||
|url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.832.2872&rep=rep1&type=pdf | |||
|access-date=2022-06-11 | |||
|url-status=live | |||
|quote=Table 3.2 | |||
|archive-date=2022-06-11 | |||
|archive-url=https://web.archive.org/web/20220611175904/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.832.2872&rep=rep1&type=pdf | |||
}}</ref> ARPABET strings can be invoked in the application by wrapping the string of phonemes in ] within the input box (e.g. <code>{AA1 R P AH0 B EH2 T}</code> to denote {{IPAc-en|ˈ|ɑːr|p|ə|ˌ|b|ɛ|t}}, the pronunciation of the word ''ARPABET'').<ref name="automaton" /> | |||
The following is a table of phonemes used by 15.ai and the CMU Pronouncing Dictionary:<ref name="cmudict"> | |||
{{cite web | |||
|url= http://www.speech.cs.cmu.edu/cgi-bin/cmudict | |||
|title= The CMU Pronouncing Dictionary | |||
|last= | |||
|first= | |||
|date= 2015-07-16 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-04 | |||
|archive-date= 2022-06-03 | |||
|archive-url= https://web.archive.org/web/20220603181334/http://www.speech.cs.cmu.edu/cgi-bin/cmudict | |||
|url-status= live}}</ref> | |||
<div style="text-align: center;"> | |||
{| class="wikitable" style="display:inline-table; text-align: center;" | |||
|+ Vowels | |||
! colspan="1" | ARPABET | |||
! colspan="1" | ] | |||
! rowspan="1" | ] | |||
! rowspan="1" | Example | |||
|- | |||
| <code>AA</code> | |||
| ''ah'' | |||
| {{IPA link|ä|ɑ}} | |||
| style="text-align:left" | '''o'''dd | |||
|- | |||
| <code>AE</code> | |||
| ''a'' | |||
| {{IPA link|æ}} | |||
| style="text-align:left" | '''a'''t | |||
|- | |||
| <code>AH0</code> | |||
| ''ə'' | |||
| {{IPA link|ə}} | |||
| style="text-align:left" | '''a'''bout | |||
|- | |||
| <code>AH</code> | |||
| ''u, uh'' | |||
| {{IPA link|ʌ}} | |||
| style="text-align:left" | h'''u'''t | |||
|- | |||
| <code>AO</code> | |||
| ''aw'' | |||
| {{IPA link|ɔ}} | |||
| style="text-align:left" | '''ou'''ght | |||
|- | |||
| <code>AW</code> | |||
| ''ow'' | |||
| {{IPA|aʊ}} | |||
| style="text-align:left" | c'''ow''' | |||
|- | |||
| <code>AY</code> | |||
| ''eye'' | |||
| {{IPA|aɪ}} | |||
| style="text-align:left" | h'''i'''de | |||
|- | |||
| <code>EH</code> | |||
| ''e, eh'' | |||
| {{IPA link|ɛ}} | |||
| style="text-align:left" | '''E'''d | |||
|- | |||
|} | |||
{| class="wikitable" style="display:inline-table; text-align: center; margin-right: 2em;" | |||
|+ Vowels | |||
! colspan="1" | ARPABET | |||
! colspan="1" | ] | |||
! rowspan="1" | ] | |||
! rowspan="1" | Example | |||
|- | |||
| <code>ER</code> | |||
| ''ur'', ''ər'' | |||
| {{IPA link|ɝ}}, {{IPA link|ɚ}} | |||
| style="text-align:left" | h'''ur'''t | |||
|- | |||
| <code>EY</code> | |||
| ''ay'' | |||
| {{IPA|eɪ}} | |||
| style="text-align:left" | '''a'''te | |||
|- | |||
| <code>IH</code> | |||
| ''i'', ''ih'' | |||
| {{IPA link|ɪ}} | |||
| style="text-align:left" | '''i'''t | |||
|- | |||
| <code>IY</code> | |||
| ''ee'' | |||
| {{IPA link|i}} | |||
| style="text-align:left" | '''ea'''t | |||
|- | |||
| <code>OW</code> | |||
| ''oh'' | |||
| {{IPA|oʊ}} | |||
| style="text-align:left" | '''oa'''t | |||
|- | |||
| <code>OY</code> | |||
| ''oy'' | |||
| {{IPA|ɔɪ}} | |||
| style="text-align:left" | t'''oy''' | |||
|- | |||
| <code>UH</code> | |||
| ''uu'' | |||
| {{IPA link|ʊ}} | |||
| style="text-align:left" | h'''oo'''d | |||
|- | |||
| <code>UW</code> | |||
| ''oo'' | |||
| {{IPA link|u}} | |||
| style="text-align:left" | t'''wo''' | |||
|} | |||
{| class="wikitable" style="display:inline-table; text-align: center; margin-right: 2em;" | |||
|+ Stress | |||
! AB | |||
! Description | |||
|- | |||
| style="text-align:center" | 0 | |||
| No stress | |||
|- | |||
| style="text-align:center" | 1 | |||
| ] | |||
|- | |||
| style="text-align:center" | 2 | |||
| ] | |||
|} | |||
{| class="wikitable" style="display:inline-table; text-align: center;" | |||
|+ Consonants | |||
! colspan="1" | ARPABET | |||
! colspan="1" | ] | |||
! rowspan="1" | ] | |||
! rowspan="1" | Example | |||
|- | |||
| <code>B</code> | |||
| ''b'' | |||
| {{IPA link|b}} | |||
| style="text-align:left" | '''b'''e | |||
|- | |||
| <code>CH</code> | |||
| ''ch'', ''tch'' | |||
| {{IPA link|tʃ}} | |||
| style="text-align:left" | '''ch'''eese | |||
|- | |||
| <code>D</code> | |||
| ''d'' | |||
| {{IPA link|d}} | |||
| style="text-align:left" | '''d'''ee | |||
|- | |||
| <code>DH</code> | |||
| ''dh'' | |||
| {{IPA link|ð}} | |||
| style="text-align:left" | '''th'''ee | |||
|- | |||
| <code>F</code> | |||
| ''f'' | |||
| {{IPA link|f}} | |||
| style="text-align:left" | '''f'''ee | |||
|- | |||
| <code>G</code> | |||
| ''g'' | |||
| {{IPA link|ɡ}} | |||
| style="text-align:left" | '''g'''reen | |||
|- | |||
| <code>HH</code> | |||
| ''h'' | |||
| {{IPA link|h}} | |||
| style="text-align:left" | '''h'''e | |||
|- | |||
| <code>JH</code> | |||
| ''j'' | |||
| {{IPA link|dʒ}} | |||
| style="text-align:left" | '''g'''ee | |||
|- | |||
|} | |||
{| class="wikitable" style="display:inline-table; text-align: center;" | |||
|+ Consonants | |||
! colspan="1" | ARPABET | |||
! colspan="1" | ] | |||
! rowspan="1" | ] | |||
! rowspan="1" | Example | |||
|- | |||
| <code>K</code> | |||
| ''k'' | |||
| {{IPA link|k}} | |||
| style="text-align:left" | '''k'''ey | |||
|- | |||
| <code>L</code> | |||
| ''l'' | |||
| {{IPA link|l}} | |||
| style="text-align:left" | '''l'''ee | |||
|- | |||
| <code>M</code> | |||
| ''m'' | |||
| {{IPA link|m}} | |||
| style="text-align:left" | '''m'''e | |||
|- | |||
| <code>N</code> | |||
| ''n'' | |||
| {{IPA link|n}} | |||
| style="text-align:left" | '''kn'''ee | |||
|- | |||
| <code>NG</code> | |||
| ''ng'' | |||
| {{IPA link|ŋ}} | |||
| style="text-align:left" | pi'''ng''' | |||
|- | |||
| <code>P</code> | |||
| ''p'' | |||
| {{IPA link|p}} | |||
| style="text-align:left" | '''p'''ee | |||
|- | |||
| <code>R</code> | |||
| ''r'' | |||
| {{IPA link|r}} | |||
| style="text-align:left" | '''r'''ead | |||
|- | |||
| <code>S</code> | |||
| ''s'', ''ss'' | |||
| {{IPA link|s}} | |||
| style="text-align:left" | '''s'''ea | |||
|} | |||
{| class="wikitable" style="display:inline-table; text-align: center;" | |||
|+ Consonants | |||
! colspan="1" | ARPABET | |||
! colspan="1" | ] | |||
! rowspan="1" | ] | |||
! rowspan="1" | Example | |||
|- | |||
| <code>SH</code> | |||
| ''sh'' | |||
| {{IPA link|ʃ}} | |||
| style="text-align:left" | '''sh'''e | |||
|- | |||
| <code>T</code> | |||
| ''t'' | |||
| {{IPA link|t}} | |||
| style="text-align:left" | '''t'''ea | |||
|- | |||
| <code>TH</code> | |||
| ''th'' | |||
| {{IPA link|θ}} | |||
| style="text-align:left" | '''th'''eta | |||
|- | |||
| <code>V</code> | |||
| ''v'' | |||
| {{IPA link|v}} | |||
| style="text-align:left" | '''v'''ee | |||
|- | |||
| <code>W</code> | |||
| ''w'', ''wh'' | |||
| {{IPA link|w}} | |||
| style="text-align:left" | '''w'''e | |||
|- | |||
| <code>Y</code> | |||
| ''y'' | |||
| {{IPA link|j}} | |||
| style="text-align:left" | '''y'''ield | |||
|- | |||
| <code>Z</code> | |||
| ''z'' | |||
| {{IPA link|z}} | |||
| style="text-align:left" | '''z'''ee | |||
|- | |||
| <code>ZH</code> | |||
| ''zh'' | |||
| {{IPA link|ʒ}} | |||
| style="text-align:left" | sei'''z'''ure | |||
|} | |||
</div> | |||
{{clear}} | |||
== Background == | |||
=== Speech synthesis === | |||
{{Main|Deep learning speech synthesis}} | |||
{{See also|Audio deepfake}} | |||
]'s ].<ref name="deepmind" />]] | |||
In 2016, with the proposal of ]'s ], deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating human-like speech.<ref name="arxiv1">{{cite arXiv |last=Hsu |first=Wei-Ning |eprint=1810.07217 |title=Hierarchical Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref name="arxiv2">{{cite arXiv |last=Habib |first=Raza |eprint=1910.01709 |title=Semi-Supervised Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2019 }}</ref><ref name="deepmind">{{cite web|url=https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|title=High-fidelity speech synthesis with WaveNet|last1=van den Oord|first1=Aäron|last2=Li|first2=Yazhe|last3=Babuschkin|first3=Igor|date=2017-11-12|website=]|access-date=2022-06-05|archive-date=2022-06-18|archive-url=https://web.archive.org/web/20220618205838/https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|url-status=live}}</ref><ref name="thebatch"/> Tacotron2, a neural network architecture for speech synthesis developed by ], was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.<ref name="tacotron">{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05|archive-date=2020-11-11|archive-url=https://web.archive.org/web/20201111222714/https://google.github.io/tacotron/publications/semisupervised/index.html|url-status=live}}</ref><ref name="arxiv3">{{cite arXiv |eprint=1712.05884 |title=Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions |class=cs.CL |date=2018 |last1=Shen |first1=Jonathan |last2=Pang |first2=Ruoming |last3=Weiss |first3=Ron J. |last4=Schuster |first4=Mike |last5=Jaitly |first5=Navdeep |last6=Yang |first6=Zongheng |last7=Chen |first7=Zhifeng |last8=Zhang |first8=Yu |last9=Wang |first9=Yuxuan |last10=Skerry-Ryan |first10=RJ |last11=Saurous |first11=Rif A. |last12=Agiomyrgiannakis |first12=Yannis |last13=Wu |first13=Yonghui }}</ref> | |||
For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.<ref>{{cite arXiv |last=Chung |first=Yu-An |eprint=1808.10128 |title=Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref>{{cite arXiv |last=Ren |first=Yi |eprint=1905.06791 |title=Almost Unsupervised Text to Speech and Automatic Speech Recognition |class=cs.CL |date=2019 }}</ref> The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.<ref name="towardds"/><ref name="eurogamer"/> | |||
=== Copyrighted material in deep learning === | |||
{{Main|Authors Guild, Inc. v. Google, Inc.}} | |||
A landmark case between ] and the ] in 2013 ruled that ]—a service that searches the full text of printed copyrighted books—was ], thus meeting all requirements for fair use.<ref>- F.2d – (2d Cir, 2015). (temporary cites: 2015 U.S. App. LEXIS 17988; | |||
(October 16, 2015))</ref> This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a ] or a ''non-commercial'' ] was deemed legal.<ref name="tds"/> The legality of ''commercial'' generative models trained using copyrighted material is still under debate; due to the black-box nature of machine learning models, any allegations of copyright infringement via direct competition would be difficult to prove.<ref name="tds">{{cite web | |||
|url= https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf | |||
|title= The Most Important Court Decision For Data Science and Machine Learning | |||
|last= Stewart | |||
|first= Matthew | |||
|date= 2019-10-31 | |||
|website= Towards Data Science | |||
|access-date= 2022-02-21 | |||
|quote= | |||
|archive-date= 2022-02-21 | |||
|archive-url= https://web.archive.org/web/20220221223206/https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf | |||
|url-status= live | |||
}}</ref> | |||
== Development == | |||
15.ai was designed and created by an anonymous research scientist affiliated with the ] known by the alias ''15''.<ref name="twitter"> | |||
{{cite web | |||
|url= https://twitter.com/fifteenai | |||
|title= 15 | |||
|last= | |||
|first= | |||
|date= 2022-06-09 | |||
|website= ] | |||
|publisher= | |||
|access-date= 2022-06-09 | |||
|quote= }} | |||
</ref> The project began development while the developer was an undergraduate. The developer has stated that they are capable of paying the high cost of running the site out of pocket.<ref name="towardds"/> | |||
According to posts made by its developer on ], 15.ai costs several thousands of dollars per month to operate; they are able to support the project due to a successful startup ].<ref name="hn">{{cite web | |||
|url= https://news.ycombinator.com/item?id=31711118 | |||
|title= 15.ai | |||
|last= | |||
|first= | |||
|date= 2022-06-12 | |||
|website= ] | |||
|publisher= | |||
|access-date= 2022-06-13 | |||
|quote= | |||
|archive-date= 2022-06-13 | |||
|archive-url= https://web.archive.org/web/20220613000443/http://news.ycombinator.com/item?id=31711118 | |||
|url-status= live | |||
}}</ref> The developer has stated that during their undergraduate years at MIT, they were paid the ] to work on a related project (approximately $14 an hour in ]<ref>{{cite web | |||
|url= https://urop.mit.edu/guidelines/participation-considerations/pay-credit-volunteer/ | |||
|title= Pay, Credit & Volunteer | |||
|last= | |||
|first= | |||
|date= | |||
|website= ] ] | |||
|publisher= | |||
|access-date= 2022-06-13 | |||
|quote= | |||
|archive-date= 2022-06-19 | |||
|archive-url= https://web.archive.org/web/20220619234437/https://urop.mit.edu/guidelines/participation-considerations/pay-credit-volunteer/ | |||
|url-status= live | |||
}}</ref>) that eventually evolved into 15.ai. They also stated that the democratization of voice cloning technology is not the only function of the website; in response to a user asking whether the research could be conducted without a public website, the developer wrote: | |||
{{Blockquote | |||
|text= The website has multiple purposes. It serves as a ] of a platform that allows anyone to create ], even if they can't hire someone to voice their projects. | |||
It also demonstrates the progress of my research in a far more engaging manner—by being able to use the actual model, you can discover things about it that even I wasn't aware of (such as getting characters to make gasping noises or moans by placing commas in between certain phonemes). | |||
It also doesn't let me get away with ] and ] (which I believe is a big problem endemic in ] today—it's disingenuous and misleading). Being able to interact with the model with no filter allows the user to judge exactly how good the current work is at face value. | |||
|author=15ai, ''Hacker News''<ref name="hn"/> | |||
}} | }} | ||
15.ai was conceived in 2016 as a research project in ] by a developer known as ''"15"'' (at the age of 18<ref name="Twitter"/>) during their ] year at the ] (MIT){{sfnm|Chandraseta|2021|Temitope|2024}} as part of MIT's ] (UROP).{{sfn|Chandraseta|2021}} The developer was inspired by ]'s ] paper, with development continuing through their studies as ] released Tacotron the following year. By 2019, the developer had demonstrated at MIT their ability to replicate WaveNet and Tacotron's results using 75% less training data than previously required.{{sfn|Temitope|2024}} The name ''15'' is a reference to the creator's claim that a voice can be cloned with as little as 15 seconds of data.{{sfnm|Chandraseta|2021|Button|2021}} | |||
The developer had originally planned to pursue a ] based on their undergraduate research, but opted to work in the ] instead after their ] was accepted into the ] accelerator in 2019. After their departure in early 2020, the developer returned to their voice synthesis research, implementing it as a ]. According to the developer, instead of using conventional voice datasets like LJSpeech that contained simple, monotone recordings, they sought out more challenging voice samples that could demonstrate the model's ability to handle complex speech patterns and emotional undertones.<ref name="Twitter"/> The Pony Preservation Project—a fan initiative originating from /mlp/,{{sfn|Temitope|2024}} ]'s '']'' board, that had compiled voice clips from '']''—played a crucial role in the implementation. The project's contributors had manually trimmed, denoised, transcribed, and emotion-tagged every line from the show. This dataset provided ideal training material for 15.ai's deep learning model.{{sfn|Temitope|2024}}<ref name="Twitter">{{cite web |title=The past and future of 15.ai |url=https://x.com/fifteenai/status/1865439846744871044 |website=] |access-date=December 19, 2024 |archive-date=December 8, 2024 |archive-url=https://web.archive.org/web/20241208035548/https://x.com/fifteenai/status/1865439846744871044 }}</ref> | |||
The algorithm used by the project to facilitate the cloning of voices with minimal viable data has been dubbed '''DeepThroat'''<ref name="15aiabout">{{cite web | |||
|url= https://15.ai/about | |||
|title= 15.ai – About | |||
|last= | |||
|first= | |||
|date= 2022-02-20 | |||
|website= 15.ai | |||
|publisher= | |||
|access-date= 2022-02-20 | |||
|quote= | |||
}}</ref> (a ] in reference to ] using ] and the sexual act of ]). The project and algorithm—initially conceived as part of MIT's ]—had been in development for years before the first release of the application.<ref name="towardds"/><ref name="automaton"/> | |||
] sequence that encodes speaker information.]] | |||
]'s /mlp/ board has been integral to the development of 15.ai.<ref name="gwern"/>]] | |||
15.ai was released in March 2020 with a limited selection of characters, including those from '']'' and '']''.{{sfn|Ng|2020}}<ref name="fifteen.ai-2020">{{multiref | |||
The developer has also worked closely with the Pony Preservation Project from /mlp/, the '']'' ] of ]. The '''Pony Preservation Project''', which began in 2019, is a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence.<ref name="gwern">{{cite journal | |||
|{{cite web |title=About |url=https://fifteen.ai/about |website=fifteen.ai |access-date=December 23, 2024 |archive-url=https://archive.today/20200229215215/https://fifteen.ai/about |archive-date=February 29, 2020 |date=February 19, 2020 |type=Official website |quote=2020-02-19: The web app isn't fully ready just yet |ref=fifteen.ai-2020a}} | |||
|url= https://www.gwern.net/docs/ai/music/index#15-project-2020-section | |||
|{{cite web |title=About |url=https://fifteen.ai/about |website=fifteen.ai |access-date=December 23, 2024 |archive-url=https://archive.today/20200303041821/https://fifteen.ai/about |archive-date=March 3, 2020 |date=March 2, 2020 |type=Official website |ref=fifteen.ai-2020b}} | |||
|title= "15.ai", 15, Pony Preservation Project | |||
<!-- multiref end-->}}</ref> More voices were added to the website in the following months.{{sfnm|Scotellaro|2020a|Scotellaro|2020b}} A significant technical advancement came in late 2020 with the implementation of a multi-speaker ] in the deep neural network, enabling simultaneous training of multiple voices rather than requiring individual models for each character voice.{{sfn|Temitope|2024}} This not only allowed rapid expansion from eight to over fifty character voices,<ref name="Twitter"/> but also let the model recognize common emotional patterns across characters, even when certain emotions were missing from some characters' training data.{{sfnm|Kurosawa|2021|Temitope|2024}} | |||
|last= Branwen | |||
|first= Gwern | |||
|date= 2020-03-06 | |||
|website= Gwern.net | |||
|publisher= Gwern | |||
|access-date= 2022-06-17 | |||
|url-status= live | |||
|archive-date= 2022-03-18 | |||
|archive-url= https://web.archive.org/web/20220318160737/https://www.gwern.net/docs/ai/music/index#15-project-2020-section | |||
}}</ref><ref>{{cite web | |||
|url= https://www.equestriadaily.com/2020/03/neat-pony-preservation-project-using.html | |||
|title= Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices | |||
|last= Scotellaro | |||
|first= Shaun | |||
|date= 2020-03-14 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-11 | |||
|archive-date= 2021-06-23 | |||
|url-status= live | |||
|archive-url= https://web.archive.org/web/20210623210048/https://www.equestriadaily.com/2020/03/neat-pony-preservation-project-using.html | |||
}}</ref><ref name="ppp"> | |||
{{cite web | |||
|url= https://desuarchive.org/mlp/thread/38204261/ | |||
|title= Pony Preservation Project (Thread 108) | |||
|last= | |||
|first= | |||
|date= 2022-02-20 | |||
|website= ] | |||
|publisher= Desuarchive | |||
|access-date= 2022-02-20 | |||
|quote= }}</ref> The ''Friendship Is Magic'' voices on 15.ai were trained on a large dataset ]d by the Pony Preservation Project: audio and dialogue from the show and related media—including ], ], ], ], and various other content voiced by the same voice actors—were ], ], and ] to remove background noise. According to the developer, the collective efforts and constructive criticism from the Pony Preservation Project have been integral to the development of 15.ai.<ref name="gwern"/> | |||
In early 2021, the application went viral on ] and ], with people generating skits, ], and fan content using voices from popular games and shows that have accumulated millions of views on social media.{{sfnm|Zwiezen|2021|Clayton|2021|Ruppert|2021|Morton|2021|Kurosawa|2021|Yoshiyuki|2021}} Content creators, ], and ] have also used 15.ai as part of their videos as ].{{sfn|Play.ht|2024|ref=Play.ht-2024}}{{unreliable source?|date=January 2025}} At its peak, the platform incurred operational costs of {{Currency|12000|United States}}{{sfn|Temitope|2024}} per month from ] infrastructure needed to handle millions of daily voice generations; despite receiving offers from companies to ] 15.ai and its underlying technology, the website remained independent and was funded out of the personal previous startup earnings of the developer{{sfn|Temitope|2024}}—then aged 23 at the time.<ref name="Twitter"/> | |||
In addition, the developer has stated that the logo of 15.ai, which features a robotic ], is an homage to the fact that her voice (as originally portrayed by ]) was indispensable to the implementation of emotional contextualizers.<ref name="hn"/> | |||
=== Voiceverse NFT controversy === | |||
== Reception == | |||
] wrote that the technology behind 15.ai could potentially open up to cases of ].]] | |||
15.ai has been met with largely positive reception. Liana Ruppert of '']'' described 15.ai as "simplistically brilliant."<ref name="gameinformer"/> Lauren Morton of '']'' and Natalia Clayton of '']'' called it "fascinating,"<ref name="rockpapershotgun"/><ref name="pcgamer"/> and José Villalobos of '']'' wrote that it "works as easy as it looks."<ref name="LaPS4"/>{{efn|Translated from original quote written in Spanish: ''"La dirección es 15.AI y funciona tan fácil como parece."''<ref name="LaPS4"/>}} Users praised the ability to easily create audio of popular characters that sound believable to those unaware that the voices had been synthesized by artificial intelligence: Zack Zwiezen of '']'' reported that " girlfriend was convinced it was a new voice line from ]' voice actor, ],"<ref name="kotaku"/> while Rionaldi Chandraseta of '']'' wrote that, upon watching a ] video featuring popular character voices generated by 15.ai, " first thought was the video creator used ] to pay for new dialogues from the original voice actors" and stated that "the quality of voices done by 15.ai is miles ahead of ."<ref name="towardds"/> | |||
Reception has also been largely acclaimed overseas, especially in ]. Takayuki Furushima of ''Den Fami Nico Gamer'' has described 15.ai as "like magic," and Yuki Kurosawa of ''Automaton Media'' called it "revolutionary."<ref name="Denfaminicogamer"/><ref name="automaton"/> | |||
Computer scientist and technology entrepreneur ] commented in his newsletter '']'' that the technology behind 15.ai could be "enormously productive" and could "revolutionize the use of ]s"; however, he also noted that "synthesizing a human actor's voice without consent is arguably unethical and possibly illegal" and could potentially open up to cases of ].<ref name="thebatch"/><ref name="batch"/> In his blog '']'', ] ] deemed 15 one of the "most underrated talents in AI and machine learning."<ref>{{cite web | |||
|url= https://marginalrevolution.com/marginalrevolution/2022/05/the-most-underrated-talent-in-ai.html | |||
|title= The most underrated talent in AI? | |||
|last= Cowen | |||
|first= Tyler | |||
|date= 2022-05-12 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-06-16 | |||
|url-status= live | |||
|archive-date= 2022-06-19 | |||
|archive-url= https://web.archive.org/web/20220619203626/https://marginalrevolution.com/marginalrevolution/2022/05/the-most-underrated-talent-in-ai.html | |||
}}</ref> | |||
== Impact == | |||
=== Fandom content creation === | |||
<!-- Deleted image removed: ] --> | |||
15.ai has been frequently used for ] in various ]s, including the ], the '']'' fandom, the '']'' fandom, and the '']'' fandom. Numerous videos and projects containing speech from 15.ai have gone ].<ref name="towardds" /><ref name="kotaku" /><ref name="gameinformer" /> However, some videos and projects that contain non-15.ai-generated speech have also gone viral, many of which do not properly credit the source(s) of the synthetic speech featured in them. As a consequence, many videos and projects that have been made with other speech synthesis software have been mistaken as being made with 15.ai, and vice versa. Due to this misattribution and absence of proper credit, 15.ai's terms of service has a rule that forbids having 15.ai-and-non-15.ai-generated speech in the same videos and projects.<ref name="15ai">{{cite web | |||
|url= https://15.ai/faq | |||
|title= 15.ai – FAQ | |||
|last= | |||
|first= | |||
|date= 2021-01-18 | |||
|website= 15.ai | |||
|publisher= | |||
|access-date= 2021-01-18 | |||
|quote= | |||
}}</ref> | |||
The ''My Little Pony: Friendship Is Magic'' fandom has seen a resurgence in video and musical content creation as a direct result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some ] have been adapted into fully voiced "episodes": ''The Tax Breaks'' is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai with ] and ], emulating the episodic style of the early seasons of ''Friendship Is Magic''.<ref name="taxbreaks">{{cite web | |||
|url= https://www.equestriadaily.com/2022/05/full-simple-animated-episode-tax-breaks.html | |||
|title= Full Simple Animated Episode – The Tax Breaks (Twilight) | |||
|last= Scotellaro | |||
|first= Shaun | |||
|date= 2022-05-15 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-05-28 | |||
|quote= | |||
|archive-date= 2022-05-21 | |||
|url-status= live | |||
|archive-url= https://web.archive.org/web/20220521132423/https://www.equestriadaily.com/2022/05/full-simple-animated-episode-tax-breaks.html | |||
}}</ref><ref>{{cite book | |||
|url= https://www.fimfiction.net/story/185725 | |||
|title= The Terribly Taxing Tribulations of Twilight Sparkle | |||
|date= 2014-04-27 | |||
|website= FimFiction.net | |||
|publisher= FimFiction.net | |||
|access-date= 2022-05-28 | |||
|quote= | |||
|archive-date= 2022-06-30 | |||
|url-status= live | |||
|archive-url= https://web.archive.org/web/20220630170105/https://www.fimfiction.net/story/185725 | |||
}}</ref> | |||
Viral videos from the ''Team Fortress 2'' fandom that feature voices from 15.ai include ''Spy is a ]'' (which has gained over 3 million views on YouTube total across multiple videos<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=TAmhr6Was3E|title=SPY IS A FURRY|work=]|access-date=June 14, 2022|archive-date=June 13, 2022|archive-url=https://web.archive.org/web/20220613094918/https://www.youtube.com/watch?v=TAmhr6Was3E|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=lwQn7ISVV_8|title=Spy is a Furry Animated|work=]|access-date=June 14, 2022|archive-date=June 14, 2022|archive-url=https://web.archive.org/web/20220614203255/https://www.youtube.com/watch?v=lwQn7ISVV_8|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=r0FLyW86owo|title= – Spy's Confession – |work=]|access-date=June 14, 2022|archive-date=June 30, 2022|archive-url=https://web.archive.org/web/20220630170113/https://www.youtube.com/watch?v=r0FLyW86owo|url-status=live}}</ref>) and ''The RED Bread Bank'', both of which have inspired ] animated video renditions.<ref name="automaton"/> Other fandoms have used voices from 15.ai to produce viral videos. {{As of|July 2022}}, the viral video ''] Struggles'' (which uses voices from ''Friendship Is Magic'') has over 5.5 million views on YouTube;<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=UPE3vnLY3TE|title=Among Us Struggles|work=]|access-date=July 15, 2022}}</ref> ], ], and ] streamers have also used 15.ai for their videos, such as FitMC's video on the history of ]—one of the oldest running '']'' servers—and datpon3's TikTok video featuring the main characters of ''Friendship Is Magic'', which have 1.4 million and 510 thousand views, respectively.<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=1V1O2gTdqHw|title=The UPDATED 2b2t Timeline (2010–2020)|work=]|access-date=June 14, 2022|archive-date=June 1, 2022|archive-url=https://web.archive.org/web/20220601085855/https://www.youtube.com/watch?v=1V1O2gTdqHw|url-status=live}}</ref><ref group="tt">{{cite web|url=https://www.tiktok.com/@datpon3/video/6813618431217241350|title=She said " 👹 " |work=]|access-date=July 15, 2022}}</ref> | |||
Some users have created AI ]s using 15.ai and external voice control software. One user on Twitter created their own personal ] desktop assistant using the voice control system ] that is able to boot up applications, utter corresponding random dialogues, and thank the user in response to actions.<ref name="automaton"/><ref name="Denfaminicogamer"/> | |||
=== Troy Baker / Voiceverse NFT plagiarism scandal === | |||
{{See also|Non-fungible token#Plagiarism and fraud}} | {{See also|Non-fungible token#Plagiarism and fraud}} | ||
{{tweet | {{tweet | ||
|image = Troy Baker SDCC 2019 (48378614692) (cropped).jpg | |image = Troy Baker SDCC 2019 (48378614692) (cropped).jpg | ||
|name = Troy Baker |
|name = Troy Baker | ||
|username = TroyBakerVA | |username = TroyBakerVA | ||
|width = 350px | |width = 350px | ||
|date = January 14, 2022 | |date = January 14, 2022 | ||
|text = |
|text = I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create. | ||
We all have a story to tell. | We all have a story to tell. | ||
You can hate. | You can hate. | ||
Or you can create. | Or you can create. | ||
What'll it be? | What'll it be? | ||
|reference = <ref>{{cite tweet |last=Baker |first=Troy |author-link=Troy Baker |user=TroyBakerVA |number=1481869350621437955 |date=January 14, 2022 |title=I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create. We all have a story to tell. You can hate. Or you can create. What'll it be? https://t.co/cfDGi4q0AZ |language=en |access-date=December 7, 2022 |archive-url=https://web.archive.org/web/20220916223855/https://twitter.com/TroyBakerVA/status/1481869350621437955 |archive-date=September 16, 2022 }}</ref> | |||
|left = yes | |||
|reference = <ref group="tweet">{{Cite tweet |user=TroyBakerVA|number=1481869350621437955|date = January 14, 2022 |title=I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP’s they create. We all have a story to tell. You can hate. Or you can create. What’ll it be?}}</ref> | |||
}} | }} | ||
On January 14, 2022, a controversy ensued after it was discovered that Voiceverse NFT, a company that video game and ] ] ] ] had announced his partnership with, had misappropriated voice lines generated from 15.ai as part of their marketing campaign.{{sfnm|Lawrence|2022|Williams|2022|Wright|2022|Temitope|2024}} This came shortly after 15.ai's developer had explicitly stated in December 2021 that they had no interest in incorporating NFTs into their work.{{sfn|Lopez|2022}} ] showed that Voiceverse had generated audio of characters from '']'' using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service.{{sfnm|Phillips|2022b|Lopez|2022}} | |||
In December 2021, the developer of 15.ai posted on ] that they had no interest in incorporating ] (NFTs) into their work.<ref name="wccftech"/><ref name="stevivor"/><ref group="tweet">{{Cite tweet |user=fifteenai |number=1470190153188749313|date = December 12, 2021 |title=I have no interest in incorporating NFTs into any aspect of my work. Please stop asking.}}</ref> | |||
Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai; in response, 15 tweeted "Go fuck yourself,"{{sfnm|Wright|2022|Phillips|2022b|fifteenai|2022}} which went viral, amassing hundreds of thousands of retweets and likes on ] in support of the developer.{{sfn|Temitope|2024}} Following continued backlash and the plagiarism revelation, Baker acknowledged that his original announcement tweet ending with "You can hate. Or you can create. What'll it be?" may have been "antagonistic," and on January 31, 2022, announced he would discontinue his partnership with Voiceverse.{{sfnm|Lawrence|2022|Williams|2022}} | |||
On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game and ] ] ] ] announced his partnership with, had plagiarized voice lines generated from 15.ai as part of their marketing campaign.<ref name="nme"/><ref name="stevivor"/><ref name="techtimes"/> ] showed that Voiceverse had generated audio of ] and ] from the show '']'' using 15.ai, pitched them up to make them sound unrecognizable from the original voices, and appropriated them without proper credit to falsely market their own platform—a violation of 15.ai's terms of service.<ref name="eurogamer">{{cite web | |||
|url= https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission | |||
|title= Troy Baker-backed NFT firm admits using voice lines taken from another service without permission | |||
|last= Phillips | |||
|first= Tom | |||
|date= 2022-01-17 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-01-17 | |||
|quote= | |||
|archive-date= 2022-01-17 | |||
|archive-url= https://web.archive.org/web/20220117164033/https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission | |||
|url-status= live | |||
}}</ref><ref name="wccftech">{{cite web | |||
|url= https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ | |||
|title= Troy Baker-backed NFT firm admits using voice lines taken from another service without permission | |||
|last= Lopez | |||
|first= Ule | |||
|date= 2022-01-16 | |||
|website= Wccftech | |||
|publisher= Wccftech | |||
|access-date= 2022-06-07 | |||
|url-status= live | |||
|archive-date= 2022-01-16 | |||
|archive-url= https://web.archive.org/web/20220116194519/https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ | |||
}}</ref><ref name="techtimes"/> | |||
=== Inactivity === | |||
{{tweet | |||
In September 2022, 15.ai was taken offline{{sfnm|ElevenLabs|2024a|1ref=ElevenLabs-2024a|Play.ht|2024|2ref=Play.ht-2024}} due to legal issues surrounding ].{{sfn|Temitope|2024}} The creator has suggested a potential future version that would better address copyright concerns from the outset, though the website remains inactive as of 2025.{{sfn|Temitope|2024}} | |||
|image = 15 ai logo transparent.png | |||
|name = 15 | |||
|username = fifteenai | |||
|width = 450px | |||
|date = January 14, 2022 | |||
|text = I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. | |||
After digging through the ], I have evidence that some of the voices that they are taking credit for were indeed generated from my own site. | |||
|reference = <ref group="tweet">{{Cite tweet |user=fifteenai |number=1482055102919757835|date = January 14, 2022 |title=I've been informed that the aforementioned NFT vocal synthesis is actively attempting to appropriate my work for their own benefit. After digging through the log files, I have evidence that some of the voices that they are taking credit for were indeed generated from my own site.}}</ref> | |||
}} | |||
{{tweet | |||
|image = Stadot-008cf0.svg | |||
|name = Voiceverse Origins | |||
|username = VoiceverseNFT | |||
|width = 450px | |||
|date = January 14, 2022 | |||
|text = Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again. | |||
|reference = <ref group="tweet">{{Cite tweet |user=VoiceverseNFT |number=1482067251704434688|date = January 14, 2022 |title=Hey @fifteenai we are extremely sorry about this. The voice was indeed taken from your platform, which our marketing team used without giving proper credit. Chubbiverse team has no knowledge of this. We will make sure this never happens again.}}</ref> | |||
}} | |||
{{tweet | |||
|image = 15 ai logo transparent.png | |||
|name = 15 | |||
|width = 450px | |||
|username = fifteenai | |||
|date = January 14, 2022 | |||
|text = Go fuck yourself. | |||
|reference = <ref group="tweet">{{Cite tweet |user=fifteenai |number=1482088782765576192|date = January 14, 2022 |title=Go fuck yourself.}}</ref> | |||
}} | |||
== Features == | |||
A week prior to the announcement of the partnership with Baker, Voiceverse made a (now-deleted) Twitter post directly responding to a (now-deleted) video posted by Chubbiverse—an NFT platform with which Voiceverse had partnered—showcasing an AI-generated voice and claimed that it was generated using Voiceverse's platform, remarking ''"I wonder who created the voice for this? ;)"''<ref name="nme" /><ref group="tweet">{{Cite tweet |user=VoiceverseNFT |number=1479505176684032000|date = January 7, 2022 |title=I wonder who created the voice for this? ;)|archive-url=https://archive.ph/0FQdJ|archive-date= January 15, 2022}}</ref> A few hours after news of the partnership broke, the developer of 15.ai—having been alerted by another Twitter user asking for his opinion on the partnership, to which he speculated that it "sounds like a scam"<ref group="tweet">{{Cite tweet |user=fifteenai |number=1482024112700723204|date = January 14, 2022 |title=Sounds like a scam}}</ref>—posted ] of log files that proved that a user of the website (with their ] redacted) had submitted inputs of the exact words spoken by the AI voice in the video posted by Chubbiverse,<ref group="tweet">{{Cite tweet |user=fifteenai |number=1482059159092793346|date = January 14, 2022 |title=Give proper credit or remove this post.}}</ref> and subsequently responded to Voiceverse's claim directly, tweeting "Certainly not you :)".<ref name="eurogamer" /><ref name="stevivor" /><ref group="tweet">{{Cite tweet |user=fifteenai |number=1482059305360797702|date = January 14, 2022 |title=Certainly not you :)}}</ref> | |||
The platform was non-commercial,{{sfn|Williams|2022}} and operated without requiring user registration or accounts.{{sfn|Phillips|2022b}} Users generated speech by inputting text and selecting a character voice, with optional parameters for emotional contextualizers and phonetic transcriptions. Each request produced three audio variations with distinct emotional deliveries sorted by ] score.{{sfnm|Chandraseta|2021|Menor|2024}} Characters available included multiple characters from '']'' and '']''; ], ], and the ] from the '']'' series; ]; Kyu Sugardust from '']'', ] from '']''; ] and ] from ]; ] from '']''; ] from '']''; ] from '']''; ] and multiple characters from '']''; the ]; ]; and ] from '']''.{{sfnm|Zwiezen|2021|Clayton|2021|Morton|2021|Ruppert|2021|Villalobos|2021|Yoshiyuki|2021|Kurosawa|2021}} Out of the over fifty<ref name="Twitter"/> voices available, thirty were of characters from '']''.{{sfn|Scotellaro|2020b}} Certain "silent" characters like ] and ] were able to be selected as a joke, and would emit silent audio files when any text was submitted.{{sfnm|Morton|2021|遊戲|2021}} | |||
Following the tweet, Voiceverse admitted to plagiarizing voices from 15.ai as their own platform, claiming that their ] team had used the project without giving proper credit and that the "Chubbiverse team no knowledge of this." In response to the admission, 15 tweeted "]."<ref name="nme" /><ref name="stevivor"/><ref name="techtimes" /><ref name="eurogamer"/> The final tweet went ], accruing over 75,000 total likes and 13,000 total retweets across multiple reposts.<ref group="tweet">{{Cite tweet |user=fifteenai |number=1482088782765576192|date = January 14, 2022 |title=Go fuck yourself.}}</ref><ref group="tweet">{{Cite tweet |user=yongyea |number=1482119084183474178|date = January 14, 2022 |title=The NFT scheme that Troy Baker is promoting is already finding itself in trouble after stealing and profiting off of somebody else's work. Who could've seen this coming.}}</ref><ref group="tweet">{{Cite tweet |user=BronyStruggle |number=1482468865368072195|date = January 15, 2022 |title=actual}}</ref> | |||
] | |||
The initial partnership between Baker and Voiceverse was met with severe backlash and universally negative reception.<ref name="nme"/> Critics highlighted the ] and potential for ]s associated with NFT sales.<ref name="eurogamer2">{{cite web | |||
The deep learning model's nondeterministic properties produced variations in speech output, creating different intonations with each generation, similar to how ] produce different takes.{{sfn|Yoshiyuki|2021}} 15.ai introduced the concept of '''emotional contextualizers,''' which allowed users to specify the emotional tone of generated speech through guiding phrases.{{sfn|Temitope|2024}} The emotional contextualizer functionality utilized DeepMoji, a sentiment analysis neural network developed at the ].{{sfnm|Kurosawa|2021|Chandraseta|2021}} Introduced in 2017, DeepMoji processed ] embeddings from 1.2 billion Twitter posts (from 2013 to 2017) to analyze emotional content. Testing showed the system could identify emotional elements, including sarcasm, more accurately than human evaluators.{{sfn|Knight|2017}} If an input into 15.ai contained additional context (specified by a vertical bar), the additional context following the bar would be used as the emotional contextualizer.{{sfn|Chandraseta|2021}} For example, if the input was <code>Today is a great day!|I'm very sad.</code>, the selected character would speak the sentence "Today is a great day!" in the emotion one would expect from someone saying the sentence "I'm very sad."{{sfn|Chandraseta|2021}} | |||
|url= https://www.eurogamer.net/articles/2022-01-14-video-game-voice-actor-troy-baker-is-now-promoting-nfts | |||
|title= Video game voice actor Troy Baker is now promoting NFTs | |||
|last= Phillips | |||
|first= Tom | |||
|date= 2022-01-14 | |||
|website= ] | |||
|publisher= ] | |||
|access-date= 2022-01-14 | |||
|quote= | |||
|archive-date= 2022-01-14 | |||
|archive-url= https://web.archive.org/web/20220114104215/https://www.eurogamer.net/articles/2022-01-14-video-game-voice-actor-troy-baker-is-now-promoting-nfts | |||
|url-status= live | |||
}}</ref> Commentators also pointed out the irony in Baker's initial Tweet announcing the partnership, which ended with "You can hate. Or you can create. What'll it be?", hours before the public revelation that the company in question had resorted to theft instead of creating their own product. Baker responded that he appreciated people sharing their thoughts and their responses were "giving a lot to think about."<ref>{{Cite web|last=McWhertor|first=Michael|date=2022-01-14|title=The Last of Us voice actor wants to sell 'voice NFTs,' drawing ire|url=https://www.polygon.com/22883752/troy-baker-nfts-voice-last-of-us-bioshock|access-date=2022-01-14|website=Polygon|language=en-US|archive-date=2022-01-14|archive-url=https://web.archive.org/web/20220114174747/https://www.polygon.com/22883752/troy-baker-nfts-voice-last-of-us-bioshock|url-status=live}}</ref><ref>{{Cite web|title=Last Of Us Voice Actor Pisses Everyone Off With NFT Push|url=https://kotaku.com/last-of-us-voice-actor-pisses-everyone-off-with-nft-pus-1848360093|access-date=2022-01-14|website=Kotaku|date=January 14, 2022|language=en-us|archive-date=2022-01-14|archive-url=https://web.archive.org/web/20220114154523/https://kotaku.com/last-of-us-voice-actor-pisses-everyone-off-with-nft-pus-1848360093|url-status=live}}</ref> He also acknowledged that the "hate/create" part in his initial Tweet might have been "a bit antagonistic," and asked fans on social media to forgive him.<ref name="stevivor"/><ref>{{Cite web|last=Purslow|first=Matt|date=2022-01-14|title=Troy Baker Is Working With NFTs, but Fans Are Unimpressed|url=https://www.ign.com/articles/troy-baker-nft-voiceverse|access-date=2022-01-14|website=IGN|language=en|archive-date=2022-01-14|archive-url=https://web.archive.org/web/20220114130245/https://www.ign.com/articles/troy-baker-nft-voiceverse|url-status=live}}</ref> Two weeks later, on January 31, Baker announced that he would discontinue his partnership with Voiceverse.<ref name="tweaktown">{{cite web | |||
|url= https://www.tweaktown.com/news/84299/last-of-us-actor-troy-baker-heeds-fans-abandons-nft-plans/index.html | |||
|title= Last of Us actor Troy Baker heeds fans, abandons NFT plans | |||
|last= Strickland | |||
|first= Derek | |||
|date= 2022-01-31 | |||
|website= Tweaktown | |||
|access-date= 2022-01-31 | |||
|quote= | |||
|archive-date= 2022-01-31 | |||
|archive-url= https://web.archive.org/web/20220131172752/https://www.tweaktown.com/news/84299/last-of-us-actor-troy-baker-heeds-fans-abandons-nft-plans/index.html | |||
|url-status= live | |||
}}</ref><ref name="wgtc">{{cite web | |||
|url= https://wegotthiscovered.com/gaming/the-last-of-us-actor-troy-baker-reverses-course-on-nfts-amid-fan-backlash/ | |||
|title= 'The Last of Us' actor Troy Baker reverses course on NFTs amid fan backlash | |||
|last= Peterson | |||
|first= Danny | |||
|date= 2022-01-31 | |||
|website= We Got This Covered | |||
|access-date= 2022-02-14 | |||
|quote= | |||
|archive-date= 2022-02-14 | |||
|archive-url= https://web.archive.org/web/20220214191046/https://wegotthiscovered.com/gaming/the-last-of-us-actor-troy-baker-reverses-course-on-nfts-amid-fan-backlash/ | |||
|url-status= live | |||
}}</ref><ref>{{Cite web|last=Peters|first=Jay|date=2022-01-31|title=The voice of Joel from The Last of Us steps away from NFT project after outcry|url=https://www.theverge.com/2022/1/31/22910633/troy-baker-voiceverse-nft-voice-actor-project-the-last-of-us|access-date=2022-02-04|website=The Verge|language=en|archive-date=2022-02-04|archive-url=https://web.archive.org/web/20220204042246/https://www.theverge.com/2022/1/31/22910633/troy-baker-voiceverse-nft-voice-actor-project-the-last-of-us|url-status=live}}</ref> | |||
]" into speech, starting from ]. English words are parsed as a string of ARPABET phonemes, then is passed through a pitch predictor and a ] generator to generate audio.]] | |||
===Reactions from voice actors=== | |||
The application used pronunciation data from ], ], and ],{{sfn|Kurosawa|2021}} the last of which is based on ], a set of English phonetic transcriptions originally developed by the ] in the 1970s. For modern and Internet-specific terminology, the system incorporated pronunciation data from ] websites, including ], ], ], and ].{{sfn|Kurosawa|2021}} Inputting ARPABET transcriptions was also supported, allowing users to correct mispronunciations or specify the desired pronunciation between ]—words that have the same spelling but have different pronunciations. Users could invoke ARPABET transcriptions by enclosing the phoneme string in curly braces within the input box (for example, <code>{AA1 R P AH0 B EH2 T}</code> to specify the pronunciation of the word "ARPABET" ({{IPAc-en|ˈ|ɑːr|p|ə|ˌ|b|ɛ|t}} {{respell|AR|pə|beht}}).{{sfnm|Kurosawa|2021|Temitope|2024}} The interface displayed parsed words with color-coding to indicate pronunciation certainty: green for words found in the existing pronunciation lookup table, blue for manually entered ARPABET pronunciations, and red for words where the pronunciation had to be algorithmically predicted.{{sfnm|www.equestriacn.com|2021|ref=www.equestriacn.com|Kurosawa|2021}} | |||
Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about ], unauthorized use of an actor's voice in ], and the potential of ].<ref name="thebatch"/><ref name="batch">{{cite web | |||
|url= https://read.deeplearning.ai/the-batch/issue-83/ | |||
<!-- ] sequence that encodes speaker information.]] --> | |||
|title= Weekly Newsletter Issue 83 | |||
Later versions of 15.ai introduced multi-speaker capabilities. Rather than training separate models for each voice, 15.ai used a unified model that learned multiple voices simultaneously through speaker ]–learned numerical representations that captured each character's unique vocal characteristics.{{sfn|Temitope|2024}}<ref name="Twitter" /> Along with the emotional context conferred by DeepMoji, this neural network architecture enabled the model to learn shared patterns across different characters' emotional expressions and speaking styles, even when individual characters lacked examples of certain emotional contexts in their training data.{{sfnm|Kurosawa|2021|Temitope|2024}} | |||
|last= Ng | |||
|first= Andrew | |||
The interface included technical metrics and graphs,{{sfn|www.equestriacn.com|2021|ref=www.equestriacn.com}} which, according to the developer, served to highlight the research aspect of the website.<ref name="Twitter" /> As of version v23, released in September 2021, the interface displayed comprehensive model analysis information, including word parsing results and emotional analysis data. The ] and ] (GAN) hybrid ] and ], introduced in an earlier version, was streamlined to remove manual parameter inputs.{{sfn|www.equestriacn.com|2021|ref=www.equestriacn.com}} | |||
|date= 2021-03-07 | |||
|website= The Batch | |||
== Reception == | |||
|publisher= The Batch | |||
=== Critical reception === | |||
|access-date= 2021-03-07 | |||
Critics described 15.ai as easy to use and generally able to convincingly replicate character voices, with occasional mixed results.{{sfnm|Clayton|2021|Ruppert|2021|Moto|2021|Scotellaro|2020c|Villalobos|2021}} Natalie Clayton of '']'' wrote that ]' voice was replicated well, but noted challenges in mimicking the ] from the '']'': "the algorithm simply can't capture ]'s whimsically droll intonation."{{sfn|Clayton|2021}} Zack Zwiezen of '']'' reported that " girlfriend was convinced it was a new voice line from GLaDOS' voice actor, ]".{{sfn|Zwiezen|2021}} Rionaldi Chandraseta of AI newsletter ''Towards Data Science'' observed that "characters with large training data produce more natural dialogues with clearer inflections and pauses between words, especially for longer sentences."{{sfn|Chandraseta|2021}} Taiwanese newspaper '']'' also highlighted 15.ai's ability to recreate GLaDOS's mechanical voice, alongside its diverse range of character voice options.{{sfn|遊戲|2021}} ''] Taiwan'' reported that "GLaDOS in ''Portal'' can pronounce lines nearly perfectly", but also criticized that "there are still many imperfections, such as word limit and tone control, which are still a little weird in some words."{{sfn|MrSun|2021}} Chris Button of AI newsletter ''Byteside'' called the ability to clone a voice with only 15 seconds of data "freaky" but also called tech behind it "impressive".{{sfn|Button|2021}} The platform's voice generation capabilities were regularly featured on '']'', a ] dedicated to the show '']'' and its other generations, with documented updates, fan creations, and additions of new character voices.{{sfnm|Scotellaro|2020a|Scotellaro|2020b|Scotellaro|2020c|Scotellaro|2020d|Scotellaro|2020e|Scotellaro|2020f}} In a post introducing new character additions to 15.ai, ''Equestria Daily'''s founder ]—also known by his online moniker "Sethisto"—wrote that "some of aren't great due to the lack of samples to draw from, but many are really impressive still anyway."{{sfnm|Scotellaro|2020b}} | |||
|quote= | |||
|archive-date= 2022-02-26 | |||
Multiple other critics also found the word count limit, prosody options, and English-only nature of the application as not entirely satisfactory.{{sfn|GamerSky|2021}}{{sfn|MrSun|2021}} Peter Paltridge of ] and ] news outlet ''Anime Superhero News'' opined that "voice synthesis has evolved to the point where the more expensive efforts are nearly indistinguishable from actual human speech," but also noted that "In some ways, ] is still more advanced than this. It was possible to affect SAM’s inflections by using special characters, as well as change his pitch at will. With 15.ai, you’re at the mercy of whatever random inflections you get."{{sfn|Paltridge|2021}} Conversely, Lauren Morton of '']'' praised the depth of pronunciation control—"if you're willing to get into the nitty gritty of it".{{sfn|Morton|2021}} Similarly, Eugenio Moto of Spanish news website ''Qore.com'' wrote that "the most experienced can change parameters like the stress or the tone."{{sfn|Moto|2021}} Takayuki Furushima of '']'' highlighted the "smooth pronunciations", and Yuki Kurosawa of '']'' noted its "rich emotional expression" as a major feature; both Japanese authors noted the lack of Japanese-language support.<ref>{{harvnb|Yoshiyuki|2021}}: 日本語入力には対応していないが、ローマ字入力でもなんとなくそれっぽい発音になる。; 15.aiはテキスト読み上げサービスだが、特筆すべきはそのなめらかな発音と、ゲームに登場するキャラクター音声を再現している点だ。 ({{translation|i=yes}} It does not support Japanese input, but even if you input using romaji, it will somehow give you a similar pronunciation.; 15.ai is a text-to-speech service, but what makes it particularly noteworthy is its smooth pronunciation and the fact that it reproduces the voices of characters that appear in games.)</ref>{{sfn|Kurosawa|2021}} Renan do Prado of the Brazilian gaming news outlet ''Arkade'' and José Villalobos of Spanish gaming outlet ''LaPS4'' pointed out that while users could create amusing results in Portuguese and Spanish respectively, the generation performed best in English.{{sfnm|do Prado|2021|Villalobos|2021}} Chinese gaming news outlet '']'' called the app "interesting", but also criticized the word count limit of the text and the lack of intonations.{{sfn|GamerSky|2021}} South Korean video game outlet ''Zuntata'' wrote that "the surprising thing about 15.ai is that , there's only about 30 seconds of data, but it achieves pronunciation accuracy close to 100%".{{sfn|zuntata.tistory.com|2021|ref=Tistory-2021}} Machine learning professor Yongqiang Li wrote in his blog that he was surprised to see that the application was free.{{sfn|Li|2021}} | |||
|archive-url= https://web.archive.org/web/20220226175907/https://read.deeplearning.ai/the-batch/issue-83/ | |||
|url-status= live | |||
=== Ethical concerns === | |||
}}</ref><ref name="wccftech"/> | |||
{{See also|Deepfake#Concerns and countermeasures}} | |||
]s had mixed reactions to 15.ai's capabilities. While some industry professionals acknowledged the technical innovation, others raised concerns about the technology's implications for their profession.{{sfnm|Phillips|2022a|Temitope|2024|Menor|2024}} When voice actor ] announced his partnership with Voiceverse NFT, which had misappropriated 15.ai's technology, it sparked widespread controversy within the voice acting industry.{{sfnm|Lawrence|2022|Phillips|2022a|Wright|2022}} Critics raised concerns about automated voice acting's potential ] for voice actors, risk of ], and potential ].{{sfnm|Phillips|2022a|Menor|2024}} The controversy surrounding Voiceverse NFT and subsequent discussions highlighted broader industry concerns about AI voice synthesis technology.{{sfnm|Phillips|2022a|Lawrence|2022}} | |||
While 15.ai limited its scope to fictional characters and did not reproduce voices of real people or celebrities,{{sfnm|fifteenai|2020|1ref=fifteen.ai-2020b|Menor|2024}} computer scientist ] noted that similar technology could be used to do so, including for nefarious purposes.{{sfnm|Ng|2020}} In his 2020 assessment of 15.ai, he wrote: | |||
{{Quote|"Voice cloning could be enormously productive. In ], it could revolutionize the use of virtual actors. In cartoons and audiobooks, it could enable voice actors to participate in many more productions. In online education, kids might pay more attention to lessons delivered by the voices of favorite personalities. And how many YouTube how-to video producers would love to have a synthetic ] narrate their scripts?}} | |||
However, he also wrote: | |||
{{Quote|"...but synthesizing a human actor's voice without consent is arguably unethical and possibly illegal. And this technology will be catnip for deepfakers, who could scrape recordings from social networks to impersonate private individuals."{{sfn|Ng|2020}}}} | |||
== Legacy == | |||
15.ai was an early pioneer of audio deepfakes, leading to the emergence of AI speech synthesis-based memes during the initial stages of the ] in 2020.<ref>{{harvnb|MrSun|2021}}: 大家是否都曾經想像過,假如能讓自己喜歡的遊戲或是動畫角色說出自己想聽的話,不論是名字、惡搞或是經典名言,都是不少人的夢想吧。不過來到 2021 年,現在這種夢想不再是想想而已,因為有一個網站通過 AI 生成的技術,讓大家可以讓不少遊戲或是動畫角色,說出任何你想要他們講出的東西,而且相似度與音調都有相當高的準確度 ({{translation|i=yes}} Have you ever imagined what it would be like if your favorite game or anime characters could say exactly what you want to hear? Whether it's names, parodies, or classic quotes, this is a dream for many. However, as we enter 2021, this dream is no longer just a fantasy, because there is a website that uses AI-generated technology, allowing users to make various game and anime characters say anything they want with impressive accuracy in both similarity and tone).</ref>{{sfn|Anirudh VK|2023}} 15.ai is credited as the first mainstream platform to popularize AI voice cloning in ]s and content creation, particularly through its ability to generate convincing character voices in real-time without requiring extensive technical expertise.{{sfnm|Temitope|2024|Morton|2021}} The platform's impact was especially notable in fan communities, including the ], '']'', '']'', and '']'' fandoms, where it enabled the creation of viral content that garnered millions of views across social media platforms like ] and ].{{sfnm|Scotellaro|2020c|遊戲|2021|Kurosawa|2021|Morton|2021|Temitope|2024}} ''Team Fortress 2'' content creators also used the platform to produce both short-form memes and complex narrative animations using ].{{sfnm|Clayton|2021|Zwiezen|2021|Morton|2021}} Fan creations included skits and new fan animations,{{sfnm|Morton|2021|Kurosawa|2021}} crossover content—such as '']'' writer Liana Ruppert's demonstration combining ''Portal'' and '']'' dialogue in her coverage of the platform{{sfn|Ruppert|2021}}—recreations of viral videos (including the infamous ]{{sfnm|Zwiezen|2021|Morton|2021}}), adaptations of ] using AI-generated character voices,{{sfn|Scotellaro|2020d}} music videos and new musical compositions—such as the ] ''Pony Zone'' series{{sfn|Scotellaro|2020e}}—and content where characters recited ].{{sfnm|Zwiezen|2021|Ruppert|2021}} Some fan creations gained mainstream attention, such as a viral edit replacing ]'s cameo in '']'' with the ]'s AI-generated voice, which was featured on a daytime ] segment in January 2021.{{sfnm|Clayton|2021|CNN|2021}}<ref>{{cite web |url=https://www.reddit.com/r/tf2/comments/l0cuwh/the_heavy_on_cnn/ |title=The Heavy on CNN |website=] |date=January 19, 2021 |access-date=December 31, 2024}}</ref> Some users integrated 15.ai's voice synthesis with VoiceAttack, a voice command software, to create personal assistants.{{sfn|Yoshiyuki|2021}} | |||
Its influence has been noted in the years after it became defunct,{{sfn|Wright|2023}} with several commercial alternatives emerging to fill the void, such as ]{{efn|which uses "11.ai" as a legal byname for its web domain{{sfn|ElevenLabs|2024b|ref=ElevenLabs-2024b}}}} and ].{{sfnm|ElevenLabs|2024a|1ref=ElevenLabs-2024a|Play.ht|2024|2ref=Play.ht-2024}} Contemporary generative voice AI companies have acknowledged 15.ai's pioneering role. PlayHT called the debut of 15.ai "a breakthrough in the field of text-to-speech (TTS) and speech synthesis".{{sfn|Play.ht|2024|ref=Play.ht-2024}} ], the founder and CEO of ], credited 15.ai for "making AI voice cloning popular for content creation by being the first to feature popular existing characters from fandoms".{{sfn|Speechify|2024}} Mati Staniszewski, the founder and CEO of ], wrote that 15.ai was transformative in the field of ].{{sfn|ElevenLabs|2024a|ref=ElevenLabs-2024a}} | |||
Prior to its shutdown, 15.ai established several technical precedents that influenced subsequent developments in AI voice synthesis. Its integration of ] for emotional analysis demonstrated the viability of incorporating sentiment-aware speech generation, while its support for ] ]s set a standard for precise pronunciation control in public-facing voice synthesis tools.{{sfn|Temitope|2024}} The platform's unified multi-speaker model, which enabled simultaneous training of diverse character voices, proved particularly influential. This approach allowed the system to recognize emotional patterns across different voices even when certain emotions were absent from individual character training sets; for example, if one character had examples of joyful speech but no angry examples, while another had angry but no joyful samples, the system could learn to generate both emotions for both characters by understanding the common patterns of how emotions affect speech.{{sfnm|Kurosawa|2021|Temitope|2024}} | |||
15.ai also made a key contribution in reducing training data requirements for speech synthesis. Earlier systems like ]'s Tacotron and ]'s FastSpeech required tens of hours of audio to produce acceptable results and failed to generate intelligible speech with less than 24 minutes of training data.<ref name="Google"/>{{sfn|Ren|2019}} In contrast, 15.ai demonstrated the ability to generate speech with substantially less training data—specifically, the name "15.ai" refers to the creator's claim that a voice could be cloned with just 15 seconds of data.{{sfnm|Chandraseta|2021|Button|2021|Temitope|2024}} This approach to data efficiency influenced subsequent developments in AI voice synthesis technology, as the 15-second benchmark became a reference point for subsequent voice synthesis systems. The original claim that only 15 seconds of data is required to clone a human's voice was corroborated by ] in 2024.{{sfnm|OpenAI|2024|1ref=OpenAI-2024|Temitope|2024}} | |||
== See also == | == See also == | ||
*] | *] | ||
*] | *] | ||
*] | *] | ||
*] | |||
*] | |||
*] | |||
*] | |||
*] | |||
*] | |||
*] | *] | ||
*] | |||
== Explanatory footnotes == | |||
==Notes== | |||
{{notelist}} | {{notelist}} | ||
==References== | == References == | ||
=== Notes === | |||
{{reflist}} | {{reflist}} | ||
;Tweets | |||
{{reflist|group=tweet|35em}} | |||
;YouTube (referenced for view counts and usage of 15.ai only) | |||
{{reflist|group=yt|35em}} | |||
;TikTok | |||
{{reflist|group=tt|35em}} | |||
=== Works cited === | |||
==External links== | |||
{{refbegin|2}} | |||
* {{Official website|15.ai}} | |||
<!--A--> | |||
* {{Twitter | id= fifteenai | name= 15 }} | |||
<!--B--> | |||
* | |||
* {{cite journal | |||
| last1 = Barakat | |||
| first1 = Huda | |||
| last2 = Turk | |||
| first2 = Oytun | |||
| last3 = Demiroglu | |||
| first3 = Cenk | |||
| title = Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources | |||
| journal = EURASIP Journal on Audio, Speech, and Music Processing | |||
| volume = 2024 | |||
| issue = 11 | |||
| year = 2024 | |||
| doi = | |||
| pages = | |||
}} | |||
* {{cite web |last=Button |first=Chris |date=January 19, 2021 |title=Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool |url=https://www.byteside.com/2021/01/15-ai-deepmoji-glados-spongebob-characters-ai-text-to-speech/ |url-status=live |access-date=December 18, 2024 |website=Byteside |quote= |archive-date=June 25, 2024 |archive-url=https://web.archive.org/web/20240625180514/https://www.byteside.com/2021/01/15-ai-deepmoji-glados-spongebob-characters-ai-text-to-speech/}} | |||
<!--C--> | |||
* {{cite web |last=Chandraseta |first=Rionaldi |date=January 21, 2021 |title=Generate Your Favourite Characters' Voice Lines using Machine Learning |url=https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6 |url-status=live |access-date=December 18, 2024 |website=Towards Data Science |archive-date=January 21, 2021 |archive-url=https://web.archive.org/web/20210121132456/https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6}} | |||
* {{cite web |last=Clayton |first=Natalie |date=January 19, 2021 |title=Make the cast of TF2 recite old memes with this AI text-to-speech tool |url=https://www.pcgamer.com/make-the-cast-of-tf2-recite-old-memes-with-this-ai-text-to-speech-tool |url-status=live |archive-url=https://web.archive.org/web/20210119133726/https://www.pcgamer.com/make-the-cast-of-tf2-recite-old-memes-with-this-ai-text-to-speech-tool/ |archive-date=January 19, 2021 |access-date=December 18, 2024 |website=] |quote=}} | |||
* {{cite news |author=<!--not stated--> |date=January 15, 2021|title=CNN Newsroom|url=https://prod.transcripts.cnn.com/show/cnr/date/2021-01-15/segment/18|publisher=]|ref={{harvid|CNN|2021}}}} | |||
<!--D--> | |||
* {{cite web|url=https://arkade.com.br/faca-glados-bob-esponja-e-outros-personagens-falarem-textos-escritos-por-voce/|trans-title=Make GLaDOS, SpongeBob and other characters speak texts written by you!|last=do Prado|first=Renan|website=Arkade|access-date=December 22, 2024|date=January 19, 2021|language=pt-br|title=Faça GLaDOS, Bob Esponja e outros personagens falarem textos escritos por você!|archive-url=https://web.archive.org/web/20220819193854/https://www.arkade.com.br/faca-glados-bob-esponja-e-outros-personagens-falarem-textos-escritos-por-voce/|archive-date=August 19, 2022}} | |||
<!--E--> | |||
* {{cite web |date=2024a<!--February 7, 2024--> |title=15.AI: Everything You Need to Know & Best Alternatives |url=https://elevenlabs.io/blog/15-ai |url-status=live |access-date=December 18, 2024 |website=] |quote=Combining speech synthesis with machine learning, deep learning, deep neural networks, and audio synthesis algorithms, 15.ai transformed how users created different voices with AI text. |archive-date=December 25, 2024 |archive-url=https://web.archive.org/web/20241225034801/https://elevenlabs.io/blog/15-ai |ref=ElevenLabs-2024a}} | |||
* {{cite web |title=Can I publish the content I generate on the platform? |url=https://help.elevenlabs.io/hc/en-us/articles/13313564601361-Can-I-publish-the-content-I-generate-on-the-platform |website=] |access-date=23 December 2024 |date=2024b<!--8 May 2024--> |type=Official website |ref=ElevenLabs-2024b}} | |||
* {{cite web|date=October 1, 2021|access-date=December 22, 2024|url=https://www.equestriacn.com/2021/10/15-ai-is-back-online-updated-to-v23.html|title=15.ai已经重新上线,版本更新至v23|trans-title=15.ai has been re-launched, version updated to v23|website=EquestriaCN|language=zh |ref=www.equestriacn.com|archive-url=https://web.archive.org/web/20240519181715/https://www.equestriacn.com/2021/10/15-ai-is-back-online-updated-to-v23.html|archive-date=May 19, 2024}} | |||
<!--F--> | |||
* {{Cite tweet |number=1482088782765576192 |user=fifteenai |title=Go fuck yourself. |date=January 14, 2022}}{{sfn whitelist|CITEREFfifteenai2022}} | |||
<!--G--> | |||
* {{cite web |date=January 18, 2021 |title=这个网站可用AI生成语音 让ACG角色"说"出你输入的文本 |trans-title=This Website Can Use AI to Generate Voice, Making ACG Characters "Say" the Text You Input |url=https://www.gamersky.com/news/202101/1355887.shtml |url-status=live |access-date=December 18, 2024 |website=] |language=zh |quote= |trans-quote= |archive-date=December 11, 2024 |archive-url=https://web.archive.org/web/20241211221628/https://www.gamersky.com/news/202101/1355887.shtml |ref={{harvid|GamerSky|2021}}}} | |||
*{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05|archive-date=2020-11-11|archive-url=https://web.archive.org/web/20201111222714/https://google.github.io/tacotron/publications/semisupervised/index.html|url-status=live|ref={{harvid|Google|2018}}}} | |||
<!--H--> | |||
*{{cite web | |||
|url= https://news.ycombinator.com/item?id=31711118 | |||
|title= 15.ai | |||
|last= | |||
|first= | |||
|date= June 12, 2022 | |||
|website= ] | |||
|publisher= | |||
|access-date= December 29, 2024|ref={{harvid|Hacker News|2022}} | |||
}} | |||
<!--K--> | |||
* {{cite arXiv|title=Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search|last=Kim|first=Jaehyeon|eprint=2005.11129|year=2020|class=eess.AS }} | |||
* {{cite web |last=Knight |first=Will |date=August 3, 2017 |title=An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter |url=https://www.technologyreview.com/2017/08/03/105566/an-algorithm-trained-on-emoji-knows-when-youre-being-sarcastic-on-twitter/ |url-status=live |archive-url=https://web.archive.org/web/20220602215737/https://www.technologyreview.com/2017/08/03/105566/an-algorithm-trained-on-emoji-knows-when-youre-being-sarcastic-on-twitter/ |archive-date=June 2, 2022 |access-date=December 18, 2024 |website=]}} | |||
* {{cite arXiv|title=HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis|last=Kong|first=Jungil|eprint=2010.05646|year=2020|class=cs.SD }} | |||
* {{cite web |last=Kurosawa |first=Yuki |date=January 19, 2021 |title=ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる |trans-title=Game Character Voice Reading Software "15.ai" Now Available. Get Characters from Undertale and Portal to Say Your Desired Lines |url=https://automaton-media.com/articles/newsjp/20210119-149494/ |url-status=live |archive-url=https://web.archive.org/web/20210119103031/https://automaton-media.com/articles/newsjp/20210119-149494/ |archive-date=January 19, 2021 |access-date=December 18, 2024 |website=] |language=ja |quote=英語版ボイスのみなので注意。;もうひとつ15.aiの大きな特徴として挙げられるのが、豊かな感情表現だ。 |trans-quote=Please note that only English voices are available.;Another major feature of 15.ai is its rich emotional expression.}} | |||
<!--L--> | |||
* {{cite web |last1=Lawrence |first1=Briana |title=Shonen Jump Scare Leads to Company Reassuring Fans That They Aren't Getting Into NFTs |url=https://www.themarysue.com/shonen-jump-not-doing-nfts/ |website=] |access-date=23 December 2024 |date=19 January 2022}} | |||
* {{cite web |last=Li |first=Yongqiang |title=语音开源项目优选:免费配音网站15.ai |trans-title=Voice Open Source Project Selection: Free Voice Acting Website 15.ai |url=https://zhuanlan.zhihu.com/p/346417192 |access-date=December 18, 2024 |date=2021 |website=] |language=zh |quote= |trans-quote= |url-status=live |archive-url=https://web.archive.org/web/20241219212053/https://zhuanlan.zhihu.com/p/346417192 |archive-date=December 19, 2024}} | |||
* {{cite web |last=Lopez |first=Ule |date=January 16, 2022 |title=Voiceverse NFT Service Reportedly Uses Stolen Technology from 15ai |url=https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ |url-status=live |archive-url=https://web.archive.org/web/20220116194519/https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ |archive-date=January 16, 2022 |access-date=June 7, 2022 |website=Wccftech}} | |||
<!--M--> | |||
* {{cite web |last=Menor |first=Deion |date=November 7, 2024 |title=15.ai – Natural and Emotional Text-to-Speech Using Neural Networks|url=https://hashdork.com/15-ai/|url-status=live|access-date=January 3, 2025|website=HashDork}} | |||
* {{cite web |last=Morton |first=Lauren |date=January 18, 2021 |title=Put words in game characters' mouths with this fascinating text to speech tool |url=https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/ |url-status=live |archive-url=https://web.archive.org/web/20210118213308/https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/ |archive-date=January 18, 2021 |access-date=December 18, 2024 |website=] |quote=}} | |||
* {{cite web |last1=Moto |first1=Eugenio |title=15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras |url=https://www.qore.com/noticias/78756/15ai-el-sitio-que-te-permite-usar-voces-de-personajes-populares-para-que-digan-lo-que-quieras/ |website=Qore |access-date=21 December 2024 |archive-url=https://web.archive.org/web/20241228150636/https://www.qore.com/noticias/78756/15ai-el-sitio-que-te-permite-usar-voces-de-personajes-populares-para-que-digan-lo-que-quieras/|archive-date=December 28, 2024|url-status=live|language=es |date=20 January 2021 |quote=Si bien los resultados ya son excepcionales, sin duda pueden mejorar más |trans-quote=While the results are already exceptional, without a doubt they can improve even more}} | |||
* {{cite web|author=MrSun |url=https://tw.news.yahoo.com/15-ai-044220764.html|date=January 19, 2021|access-date=December 22, 2024|title=讓你喜愛的ACG角色說出任何話! AI生成技術幫助你實現夢想|trans-title=Let your favorite ACG characters say anything! AI generation technology helps you realize your dreams|language=zh|archive-url=https://web.archive.org/web/20241228151221/https://tw.news.yahoo.com/15-ai-044220764.html|website=Yahoo|archive-date=December 28, 2024}} | |||
<!--N--> | |||
* {{cite web |last=Ng |first=Andrew |date=April 1, 2020 |title=Voice Cloning for the Masses |url=https://www.deeplearning.ai/the-batch/voice-cloning-for-the-masses/|access-date=December 22, 2024 |website=] |quote= |archive-url=https://web.archive.org/web/20241228151726/https://www.deeplearning.ai/the-batch/voice-cloning-for-the-masses/ |archive-date=December 28, 2024 }} | |||
<!--O--> | |||
* {{cite web |title=Navigating the Challenges and Opportunities of Synthetic Voices |url=https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/ |website=] |url-status=live |date=March 9, 2024 |access-date=December 18, 2024 |archive-date=November 25, 2024 |archive-url=https://web.archive.org/web/20241125181327/https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/ |ref=OpenAI-2024}} | |||
<!--R--> | |||
* {{cite magazine |last=Ruppert |first=Liana |date=January 18, 2021 |title=Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App |url=https://www.gameinformer.com/gamer-culture/2021/01/18/make-portals-glados-and-other-beloved-characters-say-the-weirdest-things |url-status=dead |archive-url=https://web.archive.org/web/20210118175543/https://www.gameinformer.com/gamer-culture/2021/01/18/make-portals-glados-and-other-beloved-characters-say-the-weirdest-things |archive-date=January 18, 2021 |access-date=December 18, 2024 |magazine=] |quote=}} | |||
<!--P--> | |||
* {{cite web|last=Paltridge|first=Peter|url=https://animesuperhero.com/this-website-will-say-whatever-you-type-in-spongebobs-voice/|title=This Website Will Say Whatever You Type In Spongebob's Voice|website=Anime Superhero News|access-date=December 22, 2024|date=January 18, 2021|archive-url=https://web.archive.org/web/20211017003838/https://animesuperhero.com/this-website-will-say-whatever-you-type-in-spongebobs-voice/|archive-date=October 17, 2021}} | |||
* {{cite web |last=Phillips |first=Tom |date=January 14, 2022 |title=Video game voice actor Troy Baker is now promoting NFTs|url=https://www.eurogamer.net/video-game-voice-actor-troy-baker-is-now-promoting-nfts |access-date=December 31, 2024 |website=] |quote= |ref={{harvid|Phillips|2022a}}}} | |||
* {{cite web |last=Phillips |first=Tom |date=January 17, 2022 |title=Troy Baker-backed NFT firm admits using voice lines taken from another service without permission |url=https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission |url-status=live |archive-url=https://web.archive.org/web/20220117164033/https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission |archive-date=January 17, 2022 |access-date=December 31, 2024 |website=] |quote=|ref={{harvid|Phillips|2022b}}}} | |||
* {{cite web |date=September 12, 2024 |title=Everything You Need to Know About 15.ai: The AI Voice Generator |url=https://play.ht/blog/15-ai/ |access-date=December 18, 2024 |website=Play.ht |ref=Play.ht-2024 |url-status=live |archive-url=https://web.archive.org/web/20241225034801/https://play.ht/blog/15-ai/|archive-date=December 25, 2024}} | |||
<!--R--> | |||
* {{cite arXiv|title=FastSpeech: Fast, Robust and Controllable Text to Speech|last=Ren|first=Yi|eprint=1905.09263|year=2019|class=cs.CL }}<!-- Check - Ren Yi or Yi Ren? --> | |||
* {{cite web |date= October 17, 2024|title=Free 15.ai Character Voice Cloning and Alternatives |url=https://www.resemble.ai/free-15ai-character-voice-cloning-alternatives/ |access-date=December 31, 2024 |website=Resemble.ai |ref=Resemble.ai-2024}} | |||
<!--S--> | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020a<!--March 31, 2020--> |title=Rainbow Dash Voice Added to 15.ai |url=https://www.equestriadaily.com/2020/03/rainbow-dash-voice-added-to-15ai.html |url-status=live |access-date=December 18, 2024 |website=] |quote= |archive-date=December 1, 2024 |archive-url=https://web.archive.org/web/20241201163118/https://www.equestriadaily.com/2020/03/rainbow-dash-voice-added-to-15ai.html}} | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020b<!--October 5, 2020-->|title=15.ai Adds Tons of New Pony Voices|url=https://www.equestriadaily.com/2020/10/15ai-adds-tons-of-new-pony-voices.html|access-date=December 21, 2024|website=]|url-status=live|archive-url=https://web.archive.org/web/20241226174007/https://www.equestriadaily.com/2020/10/15ai-adds-tons-of-new-pony-voices.html|archive-date=December 26, 2024}} | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020c<!--March 4, 2020--> |title=Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices |url=https://www.equestriadaily.com/2020/03/neat-pony-preservation-project-using.html |url-status=live |access-date=December 18, 2024 |website=] |archive-date=June 23, 2021 |archive-url=https://web.archive.org/web/20210623210048/https://www.equestriadaily.com/2020/03/neat-pony-preservation-project-using.html}} | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020d<!--March 4, 2020--> |title=Full Simple Animated Episode - The Tax Breaks (Twilight)|url=https://www.equestriadaily.com/2022/05/full-simple-animated-episode-tax-breaks.html |access-date=January 1, 2025 |website=]}} | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020e<!--March 4, 2020--> |title=More Pony Music! We Shine Brighter Together!|url=https://www.equestriadaily.com/2020/12/more-pony-music-we-shine-brighter.html|access-date=January 1, 2025 |website=]}} | |||
* {{cite web |last=Scotellaro |first=Shaun |date=2020f<!--March 4, 2020--> |title=New Among Us Animation Goes Viral... With Pony Voices|url=https://www.equestriadaily.com/2020/09/new-among-us-animation-goes-viral-with.html|access-date=January 1, 2025 |website=]}} | |||
<!--T--> | |||
* {{cite web |last=Temitope |first=Yusuf |date=December 10, 2024 |title=15.ai Creator reveals journey from MIT Project to internet phenomenon|url=https://guardian.ng/technology/15-ai-creator-reveals-journey-from-mit-project-to-internet-phenomenon/ |access-date=December 25, 2024 |website=] |quote= |archive-url=https://web.archive.org/web/20241228152312/https://guardian.ng/technology/15-ai-creator-reveals-journey-from-mit-project-to-internet-phenomenon/ |archive-date=December 28, 2024}} | |||
* {{cite web |date=January 20, 2021 |title=게임 캐릭터 음성으로 영어를 읽어주는 소프트 15.ai 공개. |trans-title=Software 15.ai Released That Reads English in Game Character Voices |url=https://zuntata.tistory.com/7283 |access-date=December 18, 2024 |website=] |language=ko |quote= |trans-quote= |ref=Tistory-2021 |url-status=live | archive-url=https://web.archive.org/web/20241220040335/https://zuntata.tistory.com/7283 |archive-date=December 20, 2024}} | |||
<!--U--> | |||
* {{cite web |last=遊戲 |first=遊戲角落 |date=January 20, 2021 |title=這個AI語音可以模仿《傳送門》GLaDOS講出任何對白!連《Undertale》都可以學 |trans-title=This AI Voice Can Imitate Portal's GLaDOS Saying Any Dialog! It Can Even Learn Undertale |url=https://game.udn.com/game/story/10453/5189551 |url-status=live |access-date=December 18, 2024 |website=] |language=zh-tw |archive-date=December 19, 2024 |archive-url=https://web.archive.org/web/20241219214330/https://game.udn.com/game/story/10453/5189551}} | |||
<!--V--> | |||
* {{Cite arXiv|last1=van den Oord|first1=Aaron|last2=Dieleman|first2=Sander|last3=Zen|first3=Heiga|last4=Simonyan|first4=Karen|last5=Vinyals|first5=Oriol|last6=Graves|first6=Alex|last7=Kalchbrenner|first7=Nal|last8=Senior|first8=Andrew|last9=Kavukcuoglu|first9=Koray|date=2016-09-12|title=WaveNet: A Generative Model for Raw Audio|class=cs.SD |eprint=1609.03499}} | |||
* {{cite web |last=Villalobos |first=José |date=January 18, 2021 |title=Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras |trans-title=Discover 15.AI, a Website Where You Can Make GlaDOS Say What You Want |url=https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/ |url-status=live |archive-url=https://web.archive.org/web/20210118172043/https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/ |archive-date=January 18, 2021 |access-date=January 18, 2021 |website=LaPS4 |language=es |quote=La dirección es 15.AI y funciona tan fácil como parece. |trans-quote=The address is 15.AI and it works as easy as it looks.}} | |||
* {{cite web |author=Anirudh VK |date=March 18, 2023 |title=Deepfakes Are Elevating Meme Culture, But At What Cost? |url=https://analyticsindiamag.com/ai-origins-evolution/deepfakes-are-elevating-meme-culture-but-at-what-cost/ |access-date=December 18, 2024 |website=Analytics India Magazine |quote="While AI voice memes have been around in some form since '15.ai' launched in 2020, "|url-status=live|archive-url=https://web.archive.org/web/20241226163953/https://analyticsindiamag.com/ai-origins-evolution/deepfakes-are-elevating-meme-culture-but-at-what-cost/|archive-date=December 26, 2024}} | |||
<!--W--> | |||
* {{cite web |last=Weitzman|first=Cliff|date=November 19, 2023 |title=15.ai: All about 15.ai and the best alternative |url=https://speechify.com/blog/15-ai/ |access-date=December 31, 2024 |website=] |ref=Speechify-2024}} | |||
* {{cite web |last=Williams |first=Demi |date=January 18, 2022 |title=Voiceverse NFT admits to taking voice lines from non-commercial service |url=https://www.nme.com/news/gaming-news/voiceverse-nft-admits-to-taking-voice-lines-from-non-commercial-service-3140663 |url-status=live |archive-url=https://web.archive.org/web/20220118162845/https://www.nme.com/news/gaming-news/voiceverse-nft-admits-to-taking-voice-lines-from-non-commercial-service-3140663 |archive-date=January 18, 2022 |access-date=December 18, 2024 |website=] |quote=}} | |||
* {{cite web |last=Wright |first=Steve |date=January 17, 2022 |title=Troy Baker-backed NFT company admits to using content without permission |url=https://stevivor.com/news/troy-baker-nft-voiceverse-15-ai/ |url-status=live |archive-url=https://web.archive.org/web/20220117231918/https://stevivor.com/news/troy-baker-nft-voiceverse-15-ai/ |archive-date=January 17, 2022 |access-date=December 18, 2024 |website=Stevivor |quote=}} | |||
* {{cite web |last=Wright |first=Steven |date=March 21, 2023 |title=Why Biden, Trump, and Obama Arguing Over Video Games Is YouTube's New Obsession |url=https://www.inverse.com/gaming/youtube-ai-presidential-gaming-debates |url-status=live |access-date=December 18, 2024 |website=] |quote="AI voice tools used to create "audio deepfakes" have existed for years in one form or another, with 15.ai being a notable example." |archive-date=December 20, 2024 |archive-url=https://web.archive.org/web/20241220012854/https://www.inverse.com/gaming/youtube-ai-presidential-gaming-debates}} | |||
<!--Y--> | |||
* {{cite web |last=Yoshiyuki |first=Furushima |date=January 18, 2021 |title=『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に |trans-title=Portal's GLaDOS and UNDERTALE's Sans Will Read Text for You. "15.ai" Service Aims to Reproduce Even the Emotions in Text, Becomes Topic of Discussion |url=https://news.denfaminicogamer.jp/news/210118f |url-status=live |archive-url=https://web.archive.org/web/20210118051321/https://news.denfaminicogamer.jp/news/210118f |archive-date=January 18, 2021 |access-date=December 18, 2024 |website=] |language=ja }} | |||
<!--Z--> | |||
* {{cite web |last=Zwiezen |first=Zack |date=January 18, 2021 |title=Website Lets You Make GLaDOS Say Whatever You Want |url=https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835 |url-status=live |archive-url=https://web.archive.org/web/20210117164748/https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835 |archive-date=January 17, 2021 |access-date=December 18, 2024 |website=] |quote=}} | |||
{{refend}} | |||
{{Differentiable computing}} | |||
{{Speech synthesis}} | |||
== External links == | |||
{{My Little Pony: Friendship Is Magic}} | |||
* | |||
* {{Official website}} | |||
] | ] | ||
] | |||
] | ] | ||
] | ] | ||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | ] | ||
] | |||
] | ] | ||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
{{Speech synthesis}} | |||
{{Generative AI}} | |||
{{Artificial intelligence navbox}} | |||
{{My Little Pony: Friendship Is Magic|state=expanded}} |
Latest revision as of 16:31, 10 January 2025
Real-time text-to-speech AI tool
Type of site | Artificial intelligence, speech synthesis, generative artificial intelligence |
---|---|
Available in | English |
Founder(s) | 15 |
URL | 15 |
Commercial | No |
Registration | None |
Launched | March 2020; 4 years ago (2020-03) |
Current status | Inactive |
15.ai was a free non-commercial web application that used artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by an artificial intelligence researcher known as 15 during their time at the Massachusetts Institute of Technology, the application allowed users to make characters from video games, television shows, and movies speak custom text with emotional inflections faster than real-time. The platform was notable for its ability to generate convincing voice output using minimal training data—the name "15.ai" referenced the creator's claim that a voice could be cloned with just 15 seconds of audio. It was an early example of an application of generative artificial intelligence during the initial stages of the AI boom.
Launched in March 2020, 15.ai gained widespread attention in early 2021 when it went viral on social media platforms like YouTube and Twitter, and quickly became popular among Internet fandoms, including the My Little Pony: Friendship Is Magic, Team Fortress 2, and SpongeBob SquarePants fandoms. The service distinguished itself through its support for emotional context in speech generation through emojis and precise pronunciation control through phonetic transcriptions. 15.ai is credited as the first mainstream platform to popularize AI voice cloning (audio deepfakes) in memes and content creation.
15.ai's approach to data-efficient voice synthesis and emotional expression was influential in subsequent developments in AI text-to-speech technology. In January 2022, Voiceverse NFT sparked controversy when it was discovered that the company, which had partnered with voice actor Troy Baker, had misappropriated 15.ai's work for their own platform. The service was ultimately taken offline in September 2022. Its shutdown led to the emergence of various commercial alternatives in subsequent years.
History
Background
For broader coverage of this topic, see Deep learning speech synthesis.The field of artificial speech synthesis underwent a significant transformation with the introduction of deep learning approaches. In 2016, DeepMind's publication of the seminal paper WaveNet: A Generative Model for Raw Audio marked a pivotal shift toward neural network-based speech synthesis, demonstrating unprecedented audio quality through dilated causal convolutions operating directly on raw audio waveforms at 16,000 samples per second, modeling the conditional probability distribution of each audio sample given all previous ones. Previously, concatenative synthesis—which worked by stitching together pre-recorded segments of human speech—was the predominant method for generating artificial speech, but it often produced robotic-sounding results with noticeable artifacts at the segment boundaries. Two years later, this was followed by Google AI's Tacotron in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron failed to produce intelligible speech. The same year saw the emergence of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech, followed by Glow-TTS, which introduced a flow-based approach that allowed for both fast inference and voice style transfer capabilities. Chinese tech companies also made significant contributions to the field, with Baidu and ByteDance developing proprietary text-to-speech frameworks that further advanced the state of the art, though specific technical details of their implementations remained largely undisclosed.
Development, release, and operation
15, Hacker NewsThe website has multiple purposes. It serves as a proof of concept of a platform that allows anyone to create content, even if they can't hire someone to voice their projects.
It also demonstrates the progress of my research in a far more engaging manner - by being able to use the actual model, you can discover things about it that even I wasn't aware of (such as getting characters to make gasping noises or moans by placing commas in between certain phonemes).
It also doesn't let me get away with picking and choosing the best results and showing off only the ones that work Being able to interact with the model with no filter allows the user to judge exactly how good the current work is at face value.
15.ai was conceived in 2016 as a research project in deep learning speech synthesis by a developer known as "15" (at the age of 18) during their freshman year at the Massachusetts Institute of Technology (MIT) as part of MIT's Undergraduate Research Opportunities Program (UROP). The developer was inspired by DeepMind's WaveNet paper, with development continuing through their studies as Google AI released Tacotron the following year. By 2019, the developer had demonstrated at MIT their ability to replicate WaveNet and Tacotron's results using 75% less training data than previously required. The name 15 is a reference to the creator's claim that a voice can be cloned with as little as 15 seconds of data.
The developer had originally planned to pursue a doctorate based on their undergraduate research, but opted to work in the tech industry instead after their startup was accepted into the Y Combinator accelerator in 2019. After their departure in early 2020, the developer returned to their voice synthesis research, implementing it as a web application. According to the developer, instead of using conventional voice datasets like LJSpeech that contained simple, monotone recordings, they sought out more challenging voice samples that could demonstrate the model's ability to handle complex speech patterns and emotional undertones. The Pony Preservation Project—a fan initiative originating from /mlp/, 4chan's My Little Pony board, that had compiled voice clips from My Little Pony: Friendship Is Magic—played a crucial role in the implementation. The project's contributors had manually trimmed, denoised, transcribed, and emotion-tagged every line from the show. This dataset provided ideal training material for 15.ai's deep learning model.
15.ai was released in March 2020 with a limited selection of characters, including those from My Little Pony: Friendship Is Magic and Team Fortress 2. More voices were added to the website in the following months. A significant technical advancement came in late 2020 with the implementation of a multi-speaker embedding in the deep neural network, enabling simultaneous training of multiple voices rather than requiring individual models for each character voice. This not only allowed rapid expansion from eight to over fifty character voices, but also let the model recognize common emotional patterns across characters, even when certain emotions were missing from some characters' training data.
In early 2021, the application went viral on Twitter and YouTube, with people generating skits, memes, and fan content using voices from popular games and shows that have accumulated millions of views on social media. Content creators, YouTubers, and TikTokers have also used 15.ai as part of their videos as voiceovers. At its peak, the platform incurred operational costs of US$12,000 per month from AWS infrastructure needed to handle millions of daily voice generations; despite receiving offers from companies to acquire 15.ai and its underlying technology, the website remained independent and was funded out of the personal previous startup earnings of the developer—then aged 23 at the time.
Voiceverse NFT controversy
See also: Non-fungible token § Plagiarism and fraud
Troy Baker @TroyBakerVA I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create. We all have a story to tell. You can hate. Or you can create. What'll it be?
January 14, 2022
On January 14, 2022, a controversy ensued after it was discovered that Voiceverse NFT, a company that video game and anime dub voice actor Troy Baker had announced his partnership with, had misappropriated voice lines generated from 15.ai as part of their marketing campaign. This came shortly after 15.ai's developer had explicitly stated in December 2021 that they had no interest in incorporating NFTs into their work. Log files showed that Voiceverse had generated audio of characters from My Little Pony: Friendship Is Magic using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service.
Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai; in response, 15 tweeted "Go fuck yourself," which went viral, amassing hundreds of thousands of retweets and likes on Twitter in support of the developer. Following continued backlash and the plagiarism revelation, Baker acknowledged that his original announcement tweet ending with "You can hate. Or you can create. What'll it be?" may have been "antagonistic," and on January 31, 2022, announced he would discontinue his partnership with Voiceverse.
Inactivity
In September 2022, 15.ai was taken offline due to legal issues surrounding artificial intelligence and copyright. The creator has suggested a potential future version that would better address copyright concerns from the outset, though the website remains inactive as of 2025.
Features
The platform was non-commercial, and operated without requiring user registration or accounts. Users generated speech by inputting text and selecting a character voice, with optional parameters for emotional contextualizers and phonetic transcriptions. Each request produced three audio variations with distinct emotional deliveries sorted by confidence score. Characters available included multiple characters from Team Fortress 2 and My Little Pony: Friendship Is Magic; GLaDOS, Wheatley, and the Sentry Turret from the Portal series; SpongeBob SquarePants; Kyu Sugardust from HuniePop, Rise Kujikawa from Persona 4; Daria Morgendorffer and Jane Lane from Daria; Carl Brutananadilewski from Aqua Teen Hunger Force; Steven Universe from Steven Universe; Sans from Undertale; Madeline and multiple characters from Celeste; the Tenth Doctor Who; the Narrator from The Stanley Parable; and HAL 9000 from 2001: A Space Odyssey. Out of the over fifty voices available, thirty were of characters from My Little Pony: Friendship Is Magic. Certain "silent" characters like Chell and Gordon Freeman were able to be selected as a joke, and would emit silent audio files when any text was submitted.
The deep learning model's nondeterministic properties produced variations in speech output, creating different intonations with each generation, similar to how voice actors produce different takes. 15.ai introduced the concept of emotional contextualizers, which allowed users to specify the emotional tone of generated speech through guiding phrases. The emotional contextualizer functionality utilized DeepMoji, a sentiment analysis neural network developed at the MIT Media Lab. Introduced in 2017, DeepMoji processed emoji embeddings from 1.2 billion Twitter posts (from 2013 to 2017) to analyze emotional content. Testing showed the system could identify emotional elements, including sarcasm, more accurately than human evaluators. If an input into 15.ai contained additional context (specified by a vertical bar), the additional context following the bar would be used as the emotional contextualizer. For example, if the input was Today is a great day!|I'm very sad.
, the selected character would speak the sentence "Today is a great day!" in the emotion one would expect from someone saying the sentence "I'm very sad."
The application used pronunciation data from Oxford Dictionaries API, Wiktionary, and CMU Pronouncing Dictionary, the last of which is based on ARPABET, a set of English phonetic transcriptions originally developed by the Advanced Research Projects Agency in the 1970s. For modern and Internet-specific terminology, the system incorporated pronunciation data from user-generated content websites, including Reddit, Urban Dictionary, 4chan, and Google. Inputting ARPABET transcriptions was also supported, allowing users to correct mispronunciations or specify the desired pronunciation between heteronyms—words that have the same spelling but have different pronunciations. Users could invoke ARPABET transcriptions by enclosing the phoneme string in curly braces within the input box (for example, {AA1 R P AH0 B EH2 T}
to specify the pronunciation of the word "ARPABET" (/ˈɑːrpəˌbɛt/ AR-pə-beht). The interface displayed parsed words with color-coding to indicate pronunciation certainty: green for words found in the existing pronunciation lookup table, blue for manually entered ARPABET pronunciations, and red for words where the pronunciation had to be algorithmically predicted.
Later versions of 15.ai introduced multi-speaker capabilities. Rather than training separate models for each voice, 15.ai used a unified model that learned multiple voices simultaneously through speaker embeddings–learned numerical representations that captured each character's unique vocal characteristics. Along with the emotional context conferred by DeepMoji, this neural network architecture enabled the model to learn shared patterns across different characters' emotional expressions and speaking styles, even when individual characters lacked examples of certain emotional contexts in their training data.
The interface included technical metrics and graphs, which, according to the developer, served to highlight the research aspect of the website. As of version v23, released in September 2021, the interface displayed comprehensive model analysis information, including word parsing results and emotional analysis data. The flow and generative adversarial network (GAN) hybrid vocoder and denoiser, introduced in an earlier version, was streamlined to remove manual parameter inputs.
Reception
Critical reception
Critics described 15.ai as easy to use and generally able to convincingly replicate character voices, with occasional mixed results. Natalie Clayton of PC Gamer wrote that SpongeBob SquarePants' voice was replicated well, but noted challenges in mimicking the Narrator from the The Stanley Parable: "the algorithm simply can't capture Kevan Brighting's whimsically droll intonation." Zack Zwiezen of Kotaku reported that " girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain". Rionaldi Chandraseta of AI newsletter Towards Data Science observed that "characters with large training data produce more natural dialogues with clearer inflections and pauses between words, especially for longer sentences." Taiwanese newspaper United Daily News also highlighted 15.ai's ability to recreate GLaDOS's mechanical voice, alongside its diverse range of character voice options. Yahoo! News Taiwan reported that "GLaDOS in Portal can pronounce lines nearly perfectly", but also criticized that "there are still many imperfections, such as word limit and tone control, which are still a little weird in some words." Chris Button of AI newsletter Byteside called the ability to clone a voice with only 15 seconds of data "freaky" but also called tech behind it "impressive". The platform's voice generation capabilities were regularly featured on Equestria Daily, a fandom news site dedicated to the show My Little Pony: Friendship Is Magic and its other generations, with documented updates, fan creations, and additions of new character voices. In a post introducing new character additions to 15.ai, Equestria Daily's founder Shaun Scotellaro—also known by his online moniker "Sethisto"—wrote that "some of aren't great due to the lack of samples to draw from, but many are really impressive still anyway."
Multiple other critics also found the word count limit, prosody options, and English-only nature of the application as not entirely satisfactory. Peter Paltridge of anime and superhero news outlet Anime Superhero News opined that "voice synthesis has evolved to the point where the more expensive efforts are nearly indistinguishable from actual human speech," but also noted that "In some ways, SAM is still more advanced than this. It was possible to affect SAM’s inflections by using special characters, as well as change his pitch at will. With 15.ai, you’re at the mercy of whatever random inflections you get." Conversely, Lauren Morton of Rock, Paper, Shotgun praised the depth of pronunciation control—"if you're willing to get into the nitty gritty of it". Similarly, Eugenio Moto of Spanish news website Qore.com wrote that "the most experienced can change parameters like the stress or the tone." Takayuki Furushima of Den Fami Nico Gamer highlighted the "smooth pronunciations", and Yuki Kurosawa of AUTOMATON noted its "rich emotional expression" as a major feature; both Japanese authors noted the lack of Japanese-language support. Renan do Prado of the Brazilian gaming news outlet Arkade and José Villalobos of Spanish gaming outlet LaPS4 pointed out that while users could create amusing results in Portuguese and Spanish respectively, the generation performed best in English. Chinese gaming news outlet GamerSky called the app "interesting", but also criticized the word count limit of the text and the lack of intonations. South Korean video game outlet Zuntata wrote that "the surprising thing about 15.ai is that , there's only about 30 seconds of data, but it achieves pronunciation accuracy close to 100%". Machine learning professor Yongqiang Li wrote in his blog that he was surprised to see that the application was free.
Ethical concerns
See also: Deepfake § Concerns and countermeasuresVoice actors had mixed reactions to 15.ai's capabilities. While some industry professionals acknowledged the technical innovation, others raised concerns about the technology's implications for their profession. When voice actor Troy Baker announced his partnership with Voiceverse NFT, which had misappropriated 15.ai's technology, it sparked widespread controversy within the voice acting industry. Critics raised concerns about automated voice acting's potential reduction of employment opportunities for voice actors, risk of voice impersonation, and potential misuse in explicit content. The controversy surrounding Voiceverse NFT and subsequent discussions highlighted broader industry concerns about AI voice synthesis technology.
While 15.ai limited its scope to fictional characters and did not reproduce voices of real people or celebrities, computer scientist Andrew Ng noted that similar technology could be used to do so, including for nefarious purposes. In his 2020 assessment of 15.ai, he wrote:
"Voice cloning could be enormously productive. In Hollywood, it could revolutionize the use of virtual actors. In cartoons and audiobooks, it could enable voice actors to participate in many more productions. In online education, kids might pay more attention to lessons delivered by the voices of favorite personalities. And how many YouTube how-to video producers would love to have a synthetic Morgan Freeman narrate their scripts?
However, he also wrote:
"...but synthesizing a human actor's voice without consent is arguably unethical and possibly illegal. And this technology will be catnip for deepfakers, who could scrape recordings from social networks to impersonate private individuals."
Legacy
15.ai was an early pioneer of audio deepfakes, leading to the emergence of AI speech synthesis-based memes during the initial stages of the AI boom in 2020. 15.ai is credited as the first mainstream platform to popularize AI voice cloning in Internet memes and content creation, particularly through its ability to generate convincing character voices in real-time without requiring extensive technical expertise. The platform's impact was especially notable in fan communities, including the My Little Pony: Friendship Is Magic, Portal, Team Fortress 2, and SpongeBob SquarePants fandoms, where it enabled the creation of viral content that garnered millions of views across social media platforms like Twitter and YouTube. Team Fortress 2 content creators also used the platform to produce both short-form memes and complex narrative animations using Source Filmmaker. Fan creations included skits and new fan animations, crossover content—such as Game Informer writer Liana Ruppert's demonstration combining Portal and Mass Effect dialogue in her coverage of the platform—recreations of viral videos (including the infamous Big Bill Hell's Cars car dealership parody), adaptations of fanfiction using AI-generated character voices, music videos and new musical compositions—such as the explicit Pony Zone series—and content where characters recited sea shanties. Some fan creations gained mainstream attention, such as a viral edit replacing Donald Trump's cameo in Home Alone 2: Lost in New York with the Heavy Weapons Guy's AI-generated voice, which was featured on a daytime CNN segment in January 2021. Some users integrated 15.ai's voice synthesis with VoiceAttack, a voice command software, to create personal assistants.
Its influence has been noted in the years after it became defunct, with several commercial alternatives emerging to fill the void, such as ElevenLabs and Speechify. Contemporary generative voice AI companies have acknowledged 15.ai's pioneering role. PlayHT called the debut of 15.ai "a breakthrough in the field of text-to-speech (TTS) and speech synthesis". Cliff Weitzman, the founder and CEO of Speechify, credited 15.ai for "making AI voice cloning popular for content creation by being the first to feature popular existing characters from fandoms". Mati Staniszewski, the founder and CEO of ElevenLabs, wrote that 15.ai was transformative in the field of AI text-to-speech.
Prior to its shutdown, 15.ai established several technical precedents that influenced subsequent developments in AI voice synthesis. Its integration of DeepMoji for emotional analysis demonstrated the viability of incorporating sentiment-aware speech generation, while its support for ARPABET phonetic transcriptions set a standard for precise pronunciation control in public-facing voice synthesis tools. The platform's unified multi-speaker model, which enabled simultaneous training of diverse character voices, proved particularly influential. This approach allowed the system to recognize emotional patterns across different voices even when certain emotions were absent from individual character training sets; for example, if one character had examples of joyful speech but no angry examples, while another had angry but no joyful samples, the system could learn to generate both emotions for both characters by understanding the common patterns of how emotions affect speech.
15.ai also made a key contribution in reducing training data requirements for speech synthesis. Earlier systems like Google AI's Tacotron and Microsoft Research's FastSpeech required tens of hours of audio to produce acceptable results and failed to generate intelligible speech with less than 24 minutes of training data. In contrast, 15.ai demonstrated the ability to generate speech with substantially less training data—specifically, the name "15.ai" refers to the creator's claim that a voice could be cloned with just 15 seconds of data. This approach to data efficiency influenced subsequent developments in AI voice synthesis technology, as the 15-second benchmark became a reference point for subsequent voice synthesis systems. The original claim that only 15 seconds of data is required to clone a human's voice was corroborated by OpenAI in 2024.
See also
- AI boom
- Character.ai
- Deepfake
- Ethics of artificial intelligence
- WaveNet
- My Little Pony: Friendship Is Magic fandom
Explanatory footnotes
- The term "faster than real-time" in speech synthesis means that the system can generate audio more quickly than the actual duration of the speech—for example, generating 10 seconds of speech in less than 10 seconds would be considered faster than real-time.
- which uses "11.ai" as a legal byname for its web domain
References
Notes
- 遊戲 2021; Yoshiyuki 2021.
- Kurosawa 2021; Ruppert 2021; Clayton 2021; Morton 2021; Temitope 2024.
- ^ Ng 2020.
- Zwiezen 2021; Chandraseta 2021; Temitope 2024.
- ^ GamerSky 2021.
- Speechify 2024; Temitope 2024; Anirudh VK 2023; Wright 2023.
- Barakat 2024. sfn error: no target: CITEREFBarakat2024 (help)
- van den Oord 2016. sfn error: no target: CITEREFvan_den_Oord2016 (help)
- ^ Google 2018
- Kong 2020.
- Kim 2020.
- ^ Temitope 2024.
- Hacker News 2022
- ^ "The past and future of 15.ai". Twitter. Archived from the original on December 8, 2024. Retrieved December 19, 2024.
- Chandraseta 2021; Temitope 2024.
- ^ Chandraseta 2021.
- Chandraseta 2021; Button 2021.
-
- "About". fifteen.ai (Official website). February 19, 2020. Archived from the original on February 29, 2020. Retrieved December 23, 2024.
2020-02-19: The web app isn't fully ready just yet
- "About". fifteen.ai (Official website). March 2, 2020. Archived from the original on March 3, 2020. Retrieved December 23, 2024.
- "About". fifteen.ai (Official website). February 19, 2020. Archived from the original on February 29, 2020. Retrieved December 23, 2024.
- Scotellaro 2020a; Scotellaro 2020b.
- ^ Kurosawa 2021; Temitope 2024.
- Zwiezen 2021; Clayton 2021; Ruppert 2021; Morton 2021; Kurosawa 2021; Yoshiyuki 2021.
- ^ Play.ht 2024.
- Baker, Troy (January 14, 2022). "I'm partnering with @VoiceverseNFT to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create. We all have a story to tell. You can hate. Or you can create. What'll it be? https://t.co/cfDGi4q0AZ" (Tweet). Archived from the original on September 16, 2022. Retrieved December 7, 2022 – via Twitter.
- Lawrence 2022; Williams 2022; Wright 2022; Temitope 2024.
- Lopez 2022.
- Phillips 2022b; Lopez 2022.
- Wright 2022; Phillips 2022b; fifteenai 2022.
- Lawrence 2022; Williams 2022.
- ^ ElevenLabs 2024a; Play.ht 2024.
- Williams 2022.
- Phillips 2022b.
- Chandraseta 2021; Menor 2024.
- Zwiezen 2021; Clayton 2021; Morton 2021; Ruppert 2021; Villalobos 2021; Yoshiyuki 2021; Kurosawa 2021.
- ^ Scotellaro 2020b.
- Morton 2021; 遊戲 2021.
- ^ www.equestriacn.com 2021.
- ^ Yoshiyuki 2021.
- Kurosawa 2021; Chandraseta 2021.
- Knight 2017.
- ^ Kurosawa 2021.
- www.equestriacn.com 2021 sfnm error: no target: CITEREFwww.equestriacn.com2021 (help); Kurosawa 2021.
- Clayton 2021; Ruppert 2021; Moto 2021; Scotellaro 2020c; Villalobos 2021.
- Clayton 2021.
- Zwiezen 2021.
- 遊戲 2021.
- ^ MrSun 2021.
- Button 2021.
- Scotellaro 2020a; Scotellaro 2020b; Scotellaro 2020c; Scotellaro 2020d; Scotellaro 2020e; Scotellaro 2020f.
- Paltridge 2021.
- Morton 2021.
- Moto 2021.
- Yoshiyuki 2021: 日本語入力には対応していないが、ローマ字入力でもなんとなくそれっぽい発音になる。; 15.aiはテキスト読み上げサービスだが、特筆すべきはそのなめらかな発音と、ゲームに登場するキャラクター音声を再現している点だ。 (transl. It does not support Japanese input, but even if you input using romaji, it will somehow give you a similar pronunciation.; 15.ai is a text-to-speech service, but what makes it particularly noteworthy is its smooth pronunciation and the fact that it reproduces the voices of characters that appear in games.)
- do Prado 2021; Villalobos 2021.
- zuntata.tistory.com 2021.
- Li 2021.
- Phillips 2022a; Temitope 2024; Menor 2024.
- Lawrence 2022; Phillips 2022a; Wright 2022.
- Phillips 2022a; Menor 2024.
- Phillips 2022a; Lawrence 2022.
- fifteenai 2020; Menor 2024.
- MrSun 2021: 大家是否都曾經想像過,假如能讓自己喜歡的遊戲或是動畫角色說出自己想聽的話,不論是名字、惡搞或是經典名言,都是不少人的夢想吧。不過來到 2021 年,現在這種夢想不再是想想而已,因為有一個網站通過 AI 生成的技術,讓大家可以讓不少遊戲或是動畫角色,說出任何你想要他們講出的東西,而且相似度與音調都有相當高的準確度 (transl. Have you ever imagined what it would be like if your favorite game or anime characters could say exactly what you want to hear? Whether it's names, parodies, or classic quotes, this is a dream for many. However, as we enter 2021, this dream is no longer just a fantasy, because there is a website that uses AI-generated technology, allowing users to make various game and anime characters say anything they want with impressive accuracy in both similarity and tone).
- Anirudh VK 2023.
- Temitope 2024; Morton 2021.
- Scotellaro 2020c; 遊戲 2021; Kurosawa 2021; Morton 2021; Temitope 2024.
- Clayton 2021; Zwiezen 2021; Morton 2021.
- Morton 2021; Kurosawa 2021.
- Ruppert 2021.
- Zwiezen 2021; Morton 2021.
- Scotellaro 2020d.
- Scotellaro 2020e.
- Zwiezen 2021; Ruppert 2021.
- Clayton 2021; CNN 2021.
- "The Heavy on CNN". Reddit. January 19, 2021. Retrieved December 31, 2024.
- Wright 2023.
- ElevenLabs 2024b.
- Speechify 2024. sfn error: no target: CITEREFSpeechify2024 (help)
- ElevenLabs 2024a.
- Ren 2019.
- Chandraseta 2021; Button 2021; Temitope 2024.
- OpenAI 2024; Temitope 2024.
Works cited
- Barakat, Huda; Turk, Oytun; Demiroglu, Cenk (2024). "Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources". EURASIP Journal on Audio, Speech, and Music Processing. 2024 (11).
- Button, Chris (January 19, 2021). "Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool". Byteside. Archived from the original on June 25, 2024. Retrieved December 18, 2024.
- Chandraseta, Rionaldi (January 21, 2021). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on January 21, 2021. Retrieved December 18, 2024.
- Clayton, Natalie (January 19, 2021). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. Archived from the original on January 19, 2021. Retrieved December 18, 2024.
- "CNN Newsroom". CNN. January 15, 2021.
- do Prado, Renan (January 19, 2021). "Faça GLaDOS, Bob Esponja e outros personagens falarem textos escritos por você!" [Make GLaDOS, SpongeBob and other characters speak texts written by you!]. Arkade (in Brazilian Portuguese). Archived from the original on August 19, 2022. Retrieved December 22, 2024.
- "15.AI: Everything You Need to Know & Best Alternatives". ElevenLabs. 2024a. Archived from the original on December 25, 2024. Retrieved December 18, 2024.
Combining speech synthesis with machine learning, deep learning, deep neural networks, and audio synthesis algorithms, 15.ai transformed how users created different voices with AI text.
- "Can I publish the content I generate on the platform?". ElevenLabs (Official website). 2024b. Retrieved December 23, 2024.
- "15.ai已经重新上线,版本更新至v23" [15.ai has been re-launched, version updated to v23]. EquestriaCN (in Chinese). October 1, 2021. Archived from the original on May 19, 2024. Retrieved December 22, 2024.
- @fifteenai (January 14, 2022). "Go fuck yourself" (Tweet) – via Twitter.
- "这个网站可用AI生成语音 让ACG角色"说"出你输入的文本" [This Website Can Use AI to Generate Voice, Making ACG Characters "Say" the Text You Input]. GamerSky (in Chinese). January 18, 2021. Archived from the original on December 11, 2024. Retrieved December 18, 2024.
- "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". August 30, 2018. Archived from the original on November 11, 2020. Retrieved June 5, 2022.
- "15.ai". Hacker News. June 12, 2022. Retrieved December 29, 2024.
- Kim, Jaehyeon (2020). "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search". arXiv:2005.11129 .
- Knight, Will (August 3, 2017). "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter". MIT Technology Review. Archived from the original on June 2, 2022. Retrieved December 18, 2024.
- Kong, Jungil (2020). "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis". arXiv:2010.05646 .
- Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる" [Game Character Voice Reading Software "15.ai" Now Available. Get Characters from Undertale and Portal to Say Your Desired Lines]. AUTOMATON (in Japanese). Archived from the original on January 19, 2021. Retrieved December 18, 2024.
英語版ボイスのみなので注意。;もうひとつ15.aiの大きな特徴として挙げられるのが、豊かな感情表現だ。
[Please note that only English voices are available.;Another major feature of 15.ai is its rich emotional expression.] - Lawrence, Briana (January 19, 2022). "Shonen Jump Scare Leads to Company Reassuring Fans That They Aren't Getting Into NFTs". The Mary Sue. Retrieved December 23, 2024.
- Li, Yongqiang (2021). "语音开源项目优选:免费配音网站15.ai" [Voice Open Source Project Selection: Free Voice Acting Website 15.ai]. Zhihu (in Chinese). Archived from the original on December 19, 2024. Retrieved December 18, 2024.
- Lopez, Ule (January 16, 2022). "Voiceverse NFT Service Reportedly Uses Stolen Technology from 15ai [UPDATE]". Wccftech. Archived from the original on January 16, 2022. Retrieved June 7, 2022.
- Menor, Deion (November 7, 2024). "15.ai – Natural and Emotional Text-to-Speech Using Neural Networks". HashDork. Retrieved January 3, 2025.
{{cite web}}
: CS1 maint: url-status (link) - Morton, Lauren (January 18, 2021). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Archived from the original on January 18, 2021. Retrieved December 18, 2024.
- Moto, Eugenio (January 20, 2021). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Qore (in Spanish). Archived from the original on December 28, 2024. Retrieved December 21, 2024.
Si bien los resultados ya son excepcionales, sin duda pueden mejorar más
[While the results are already exceptional, without a doubt they can improve even more] - MrSun (January 19, 2021). "讓你喜愛的ACG角色說出任何話! AI生成技術幫助你實現夢想" [Let your favorite ACG characters say anything! AI generation technology helps you realize your dreams]. Yahoo (in Chinese). Archived from the original on December 28, 2024. Retrieved December 22, 2024.
- Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". DeepLearning.AI. Archived from the original on December 28, 2024. Retrieved December 22, 2024.
- "Navigating the Challenges and Opportunities of Synthetic Voices". OpenAI. March 9, 2024. Archived from the original on November 25, 2024. Retrieved December 18, 2024.
- Ruppert, Liana (January 18, 2021). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Archived from the original on January 18, 2021. Retrieved December 18, 2024.
- Paltridge, Peter (January 18, 2021). "This Website Will Say Whatever You Type In Spongebob's Voice". Anime Superhero News. Archived from the original on October 17, 2021. Retrieved December 22, 2024.
- Phillips, Tom (January 14, 2022). "Video game voice actor Troy Baker is now promoting NFTs". Eurogamer. Retrieved December 31, 2024.
- Phillips, Tom (January 17, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Archived from the original on January 17, 2022. Retrieved December 31, 2024.
- "Everything You Need to Know About 15.ai: The AI Voice Generator". Play.ht. September 12, 2024. Archived from the original on December 25, 2024. Retrieved December 18, 2024.
- Ren, Yi (2019). "FastSpeech: Fast, Robust and Controllable Text to Speech". arXiv:1905.09263 .
- "Free 15.ai Character Voice Cloning and Alternatives". Resemble.ai. October 17, 2024. Retrieved December 31, 2024.
- Scotellaro, Shaun (2020a). "Rainbow Dash Voice Added to 15.ai". Equestria Daily. Archived from the original on December 1, 2024. Retrieved December 18, 2024.
- Scotellaro, Shaun (2020b). "15.ai Adds Tons of New Pony Voices". Equestria Daily. Archived from the original on December 26, 2024. Retrieved December 21, 2024.
- Scotellaro, Shaun (2020c). "Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices". Equestria Daily. Archived from the original on June 23, 2021. Retrieved December 18, 2024.
- Scotellaro, Shaun (2020d). "Full Simple Animated Episode - The Tax Breaks (Twilight)". Equestria Daily. Retrieved January 1, 2025.
- Scotellaro, Shaun (2020e). "More Pony Music! We Shine Brighter Together!". Equestria Daily. Retrieved January 1, 2025.
- Scotellaro, Shaun (2020f). "New Among Us Animation Goes Viral... With Pony Voices". Equestria Daily. Retrieved January 1, 2025.
- Temitope, Yusuf (December 10, 2024). "15.ai Creator reveals journey from MIT Project to internet phenomenon". The Guardian. Archived from the original on December 28, 2024. Retrieved December 25, 2024.
- "게임 캐릭터 음성으로 영어를 읽어주는 소프트 15.ai 공개" [Software 15.ai Released That Reads English in Game Character Voices]. Tistory (in Korean). January 20, 2021. Archived from the original on December 20, 2024. Retrieved December 18, 2024.
- 遊戲, 遊戲角落 (January 20, 2021). "這個AI語音可以模仿《傳送門》GLaDOS講出任何對白!連《Undertale》都可以學" [This AI Voice Can Imitate Portal's GLaDOS Saying Any Dialog! It Can Even Learn Undertale]. United Daily News (in Chinese (Taiwan)). Archived from the original on December 19, 2024. Retrieved December 18, 2024.
- van den Oord, Aaron; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (September 12, 2016). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 .
- Villalobos, José (January 18, 2021). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras" [Discover 15.AI, a Website Where You Can Make GlaDOS Say What You Want]. LaPS4 (in Spanish). Archived from the original on January 18, 2021. Retrieved January 18, 2021.
La dirección es 15.AI y funciona tan fácil como parece.
[The address is 15.AI and it works as easy as it looks.] - Anirudh VK (March 18, 2023). "Deepfakes Are Elevating Meme Culture, But At What Cost?". Analytics India Magazine. Archived from the original on December 26, 2024. Retrieved December 18, 2024.
While AI voice memes have been around in some form since '15.ai' launched in 2020,
- Weitzman, Cliff (November 19, 2023). "15.ai: All about 15.ai and the best alternative". Speechify. Retrieved December 31, 2024.
- Williams, Demi (January 18, 2022). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. Archived from the original on January 18, 2022. Retrieved December 18, 2024.
- Wright, Steve (January 17, 2022). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on January 17, 2022. Retrieved December 18, 2024.
- Wright, Steven (March 21, 2023). "Why Biden, Trump, and Obama Arguing Over Video Games Is YouTube's New Obsession". Inverse. Archived from the original on December 20, 2024. Retrieved December 18, 2024.
AI voice tools used to create "audio deepfakes" have existed for years in one form or another, with 15.ai being a notable example.
- Yoshiyuki, Furushima (January 18, 2021). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に" [Portal's GLaDOS and UNDERTALE's Sans Will Read Text for You. "15.ai" Service Aims to Reproduce Even the Emotions in Text, Becomes Topic of Discussion]. Den Fami Nico Gamer (in Japanese). Archived from the original on January 18, 2021. Retrieved December 18, 2024.
- Zwiezen, Zack (January 18, 2021). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Archived from the original on January 17, 2021. Retrieved December 18, 2024.
External links
Speech synthesis | |||||
---|---|---|---|---|---|
Free software |
| ||||
Proprietary software |
| ||||
Machine | |||||
Applications | |||||
Protocols | |||||
Developers/ Researchers | |||||
Process |
Generative AI | |||||||||
---|---|---|---|---|---|---|---|---|---|
Concepts | |||||||||
Models |
| ||||||||
Companies | |||||||||
Category |
My Little Pony (2010–2021) | |||
---|---|---|---|
Equestria | |||
Friendship Is Magic (2010–2019) |
| ||
My Little Pony: The Movie (2017) | |||
Other series |
| ||
Games | |||
Comics | |||
Fandom |
| ||
See also | My Little Pony: Equestria Girls | ||
- Internet properties established in 2020
- Applications of artificial intelligence
- 2020 software
- 2020 in Internet culture
- 2020s in Internet culture
- 2020s fads and trends
- Web applications
- Speech synthesis
- Deep learning software applications
- Deepfakes
- Generative artificial intelligence
- My Little Pony: Friendship Is Magic fandom
- Massachusetts Institute of Technology software