Revision as of 23:20, 8 April 2012 editDerekleungtszhei (talk | contribs)Extended confirmed users1,046 editsmNo edit summary← Previous edit | Latest revision as of 04:54, 7 January 2025 edit undoJessicapierce (talk | contribs)Extended confirmed users113,400 editsm Undid nonconstructive edit.Tag: Undo | ||
(695 intermediate revisions by more than 100 users not shown) | |||
Line 1: | Line 1: | ||
{{short description|CAPTCHA implementation owned by Google}} | |||
{{lowercase}} | |||
{{About|a specific implementation of a CAPTCHA|the original test|CAPTCHA}} | |||
{{Ctitle|ʀeCAPTCHA}} | |||
{{Lowercase title}} | |||
{{update|date=February 2012}} | |||
{{Use mdy dates|date=January 2019}} | |||
] | |||
{{Infobox software | |||
| name = reCAPTCHA Inc. | |||
| logo = RecaptchaLogo.svg | |||
| logo size = 130px | |||
| author = {{plainlist| | |||
* ] | |||
* ] | |||
* David Abraham | |||
* Michael Crawford | |||
* Ben Maurer | |||
* Colin McMillen | |||
* Harshad Bhujbal | |||
* Edison Tan | |||
}} | |||
| developer = ] | |||
| genre = Classic version: ] <br /> New version: ] | |||
| website = {{URL|https://google.com/recaptcha}} | |||
| released = {{start date and age|2007|5|27}} | |||
}} | |||
'''reCAPTCHA Inc.'''<ref>{{Cite web |date=2007-08-28 |title=Recaptcha Inc. |url=https://opencorporates.com/companies/us_de/4414133 |access-date=2023-08-20 |website=] |archive-date=August 20, 2023 |archive-url=https://web.archive.org/web/20230820085852/https://opencorporates.com/companies/us_de/4414133 |url-status=live }}</ref> is a ] system owned by ]. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically.<ref name="No CAPTCHA">{{cite web |last=Shet |first=Vinay |date=December 3, 2014 |title=Are you a robot? Introducing 'CAPTCHA the ReCAPTCHA PREDATORS<!-- The non-breaking space ( ) is needed to hide the wrongful "Cite uses generic title" message--> |url=https://security.googleblog.com/2014/12/are-you-robot-introducing-no-captcha.html |access-date=24 February 2021 |archive-date=September 3, 2020|archive-url=https://web.archive.org/web/20200903034319/https://security.googleblog.com/2014/12/are-you-robot-introducing-no-captcha.html|url-status=live}}</ref> Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.<ref>{{cite web| url=https://developers.google.com/recaptcha/docs/v3| title=reCAPTCHA v3| access-date=September 8, 2020| archive-date=September 25, 2020| archive-url=https://web.archive.org/web/20200925031307/https://developers.google.com/recaptcha/docs/v3| url-status=live}}</ref> | |||
'''ʀeCAPTCHA''' is a system originally developed at ] main ] campus. It uses ] to help ] the text of books while protecting websites from ]s attempting to access restricted areas.<ref name="vonAhn2008">{{Cite journal| author = Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum | date= 2008 | url = http://www.cs.cmu.edu/~biglou/reCAPTCHA_Science.pdf| format = PDF | title = reCAPTCHA: Human-Based Character Recognition via Web Security Measures| journal=Science | volume=321 |number=5895 | pages=1465–1468 |doi=10.1126/science.1160379 | pmid = 18703711 | issue = 5895 | postscript = .}}</ref> On September 16, 2009, ] acquired reCAPTCHA.<ref>{{Cite web|url=http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html | publisher=Google |title=Teaching computers to read: Google acquires reCAPTCHA |accessdate=2009-09-16}}</ref> reCAPTCHA is currently digitizing the archives of '']'' and books from ].<ref>{{Cite web|url=http://www.google.com/recaptcha/faq|title=reCAPTCHA FAQ|accessdate=2011-06-12|publisher=]}}</ref> As of 2009, twenty years of ''The New York Times'' had been digitized and the project planned to have completed the remaining years by the end of 2010.<ref>{{Cite video|people=]|date=2009|title=NOVA ScienceNow s04e01|medium=Television production|accessdate=2009-07-06|time=46:58|quote=The New York Times has this huge archive, over 130 years of newspaper archive there. And we've done maybe about 20 years so far of The New York Times in the last few months and I believe we're going to be done next year by just having people do a word at a time.}}</ref> | |||
The original iteration of the service was a ] platform designed for the digitization of books, particularly those that were too illegible to be ]. The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used to ] the reading of an uncertain word.<ref>{{Citation|last=Ahn|first=Luis von|title=Massive-scale online collaboration|date=December 6, 2011 |url=https://www.ted.com/talks/luis_von_ahn_massive_scale_online_collaboration|language=en|access-date=2020-04-14|archive-date=July 15, 2020|archive-url=https://web.archive.org/web/20200715092310/https://www.ted.com/talks/luis_von_ahn_massive_scale_online_collaboration|url-status=live}}</ref> reCAPTCHA was originally developed by ], David Abraham, ], Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan at ] main ] campus.<ref>{{Cite web|title = reCAPTCHA: About Us|url=http://recaptcha.net/aboutus.html | archive-url = https://web.archive.org/web/20100611210259/http://recaptcha.net/aboutus.html|access-date = 2018-08-14 | archive-date=2010-06-11}}</ref> It was acquired by ] in September 2009.<ref name="AutoK4-1" /> The system helped to digitize the archives of '']'', and was subsequently used by ] for similar purposes.<ref>{{cite news|url=https://www.nytimes.com/2011/03/29/science/29recaptcha.html|title=Deciphering Old Texts, One Woozy, Curvy Word at a Time|date=March 28, 2011|work=The New York Times|access-date=November 20, 2017|archive-date=November 17, 2017|archive-url=https://web.archive.org/web/20171117172409/http://www.nytimes.com/2011/03/29/science/29recaptcha.html|url-status=live}}</ref> | |||
reCAPTCHA supplies subscribing websites with images of words that ] (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects. | |||
The system |
The system was reported as displaying over 100 million CAPTCHAs every day,<ref name="AutoK4-2" /> on sites such as ], TicketMaster, Twitter, ], ], ],<ref name="BBCreport" /> ] (since June 2008),<ref name="craig" /> and the U.S. National Telecommunications and Information Administration's ] coupon program website (as part of the ]).<ref name="AutoK4-5" /> | ||
In 2014, Google pivoted the service away from its original concept, with a focus on reducing the amount of user interaction needed to verify a user, and only presenting human recognition challenges (such as identifying images in a set that satisfy a specific prompt) if behavioral analysis suspects that the user may be a bot. | |||
==Origin== | |||
The reCAPTCHA program originated with ]n ] ], aided by a ]. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".<ref>{{Cite news|last=Hutchinson|first=Alex|magazine=]|title=Human Resources: The job you didn't even know you had|pages=15–16|date=March 2009|postscript=.}}</ref> | |||
In October 2023, it was found that OpenAI's ] chatbot could solve CAPTCHAs.<ref name=":0">{{Cite web |last=Edwards |first=Benj |date=2023-10-02 |title=Dead grandma locket request tricks Bing Chat's AI into solving security puzzle |url=https://arstechnica.com/information-technology/2023/10/sob-story-about-dead-grandma-tricks-microsoft-ai-into-solving-captcha/ |url-status=live |archive-url=https://web.archive.org/web/20231010194959/https://arstechnica.com/information-technology/2023/10/sob-story-about-dead-grandma-tricks-microsoft-ai-into-solving-captcha/ |archive-date=October 10, 2023 |access-date=2023-10-25 |website=Ars Technica |language=en-us}}</ref> | |||
==Operation== | |||
] | |||
Scanned text is subjected to analysis by two different ] programs. Their respective outputs are then aligned with each other by standard string matching algorithms and compared both to each other and to an English dictionary. Any word that is deciphered differently by both OCR programs or that is not in the English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed along with a control word already known. The system assumes that if the human types the control word correctly, the questionable word is also correct. If the user were to correctly type the control word "gone", but incorrectly type the word OCR failed to recognize, the digital version of documents could end up containing the incorrect word. Thus, due to human error in distinguishing between the word Internet and the French name Infernet, references to ] have occasionally become Captain In''t''ernet.<ref>{{cite web|url=http://books.google.de/books?id=fq4UAAAAQAAJ&pg=PA466&dq=%22internet%22&hl=de&sa=X&ei=h2QdT5ajC4yCtQbm691H&redir_esc=y#v=onepage&q=%22internet%22&f=false|title=The Gentleman's Magazine and Historical Chronicle.|work=]|accessdate=12 February 2012}}</ref> The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered called. Those words that are consistently given a single identity by human judges are recycled as control words.<ref>{{Cite web| first=John | last=Timmer |title=CAPTCHAs work? for digitizing old, damaged texts, manuscripts | date=2008-08-14 | publisher=Ars Technica | url =http://arstechnica.com/news.ars/post/20080814-captchas-workfor-digitizing-old-damaged-texts-manuscripts.html | accessdate = 2008-12-09}}</ref> | |||
== |
== Origin == | ||
] was the first project to volunteer its time to decipher scanned text that could not be read by ] (OCR) programs. It works with ] to digitize ] material and uses methods quite different from reCAPTCHA. | |||
reCAPTCHA tests are taken from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a ] ] with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free service (that is, the CAPTCHA images are provided to websites free of charge, in return for assistance with the decipherment),<ref name="FAQ">{{Cite web|url=http://recaptcha.net/faq.html |title=FAQ |publisher=reCAPTCHA.net}}</ref> but the reCAPTCHA software itself is not ]. | |||
The reCAPTCHA program originated with Guatemalan ] ],<ref name="CBC2" /> and was aided by a ]. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".<ref name="AutoK4-9" /> | |||
reCAPTCHA offers plugins for several web-application platforms, like ], ], or ], to ease the implementation of the service. | |||
== |
== Operation == | ||
=== reCAPTCHA v1 (human-assisted OCR){{anchor|ReCAPTCHA v1}} === | |||
] | |||
] | |||
The basis of the ] system is to prevent automated access to a system by computer programs or "bots". On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed a solve rate of 18%.<ref name="Strong CAPTCHA Guidelines">{{Cite web|url=http://bitland.net/captcha.pdf |title=Strong CAPTCHA Guidelines}}</ref><ref name="Register Article">{{Cite web|url=http://www.theregister.co.uk/2009/12/14/google_recaptcha_busted/ |title=Google's reCAPTCHA busted by new attack}}</ref><ref name=H-online Article">{{Cite web|url=http://www.h-online.com/security/news/item/Google-s-reCAPTCHA-dented-888859.html |title=Google's reCAPTCHA dented}}</ref> On August 1, 2010, Chad Houck gave a presentation to the ] 18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time.<ref name="Speaker Program">{{Cite web|url=http://www.defcon.org/html/defcon-18/dc-18-speakers.html#Houck |title=Def Con 18 Speakers |publisher=defcon.org}}</ref><ref name="Decoding reCAPTCHA">{{Cite web|url=http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.docx |title=Decoding reCAPTCHA Paper |publisher=Chad Houck}}</ref> The reCAPTCHA system was modified on 21 July 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system such as a high security lock out if a valid response isn't given 32 times in a row.<ref name="Decoding reCAPTCHA pptx">{{Cite web|url=http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.pptx |title=Decoding reCAPTCHA Power Point |publisher=Chad Houck}}</ref> reCAPTCHA frequently modifies its system which would require the author of a similar program to frequently update the method of decoding, which may frustrate potential abusers. | |||
Scanned text is subjected to analysis by two different OCRs. Any word that is deciphered differently by the two OCR programs or that is not in an English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words.<ref name="AutoK4-8" /> If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word.<ref name="Ahn, Ben Maurer 2008">{{cite journal | last1 = Luis | last2 = Maurer | first2 = Ben | last3 = McMillen | first3 = Colin | last4 = Abraham | first4 = David | last5 = Blum | first5 = Manuel | year = 2008 | title = reCAPTCHA: Human-Based Character Recognition via Web Security Measures" | journal = Science | volume = 321 | issue = 5895| pages = 1465–1468 | doi = 10.1126/science.1160379 | pmid = 18703711 | bibcode = 2008Sci...321.1465V | citeseerx = 10.1.1.141.6563| s2cid = 18371056 }}</ref> When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable.<ref name="Ahn, Ben Maurer 2008" /> | |||
==Mailhide== | |||
reCAPTCHA has also created project Mailhide, which protects ]es on web pages from being ] by ].<ref name="Mailhide">{{Cite web|url=http://www.google.com/recaptcha/mailhide/ |publisher=reCAPTCHA.net |title=Mailhide: Free Spam Protection}}</ref> By default, the email address is converted into a format that does not allow a ] to see the full email address. For example, "mailme@example.com" would be converted to "mai...@example.com". The visitor would then click on the "..." and solve the CAPTCHA in order to obtain the full email address. One can also edit the popup code so that none of the address is visible. | |||
The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document.<ref name="DM" /> Also, the control word might mislead the context for the second word, such as a request of "/metal/ /fife/" being entered as "metal ]" due to the logical connection of filing with a metal tool being considered more common than the musical instrument "]".{{citation needed|date=October 2012}} | |||
== Notes == | |||
In 2012, reCAPTCHA began using photographs taken from ] project, in addition to scanned words.<ref>{{cite web |url=https://techcrunch.com/2012/03/29/google-now-using-recaptcha-to-decode-street-view-addresses/ |title=Google Now Using ReCAPTCHA To Decode Street View Addresses |work=TechCrunch |date=March 29, 2012 |access-date=2013-07-10 |first=Sarah |last=Perez |archive-date=August 23, 2012 |archive-url=https://web.archive.org/web/20120823005658/http://techcrunch.com/2012/03/29/google-now-using-recaptcha-to-decode-street-view-addresses/ |url-status=live }}</ref> It will ask the user to identify images of crosswalks, street lights, and other objects. It has been hypothesized that the data is used by ] (a Google subsidiary) to train autonomous vehicles, though an unnamed representative has denied this, claiming the data was only being used to improve Google Maps as of mid-2021.<ref>{{Cite web |last=Vega |first=Edward |date=2021-05-14 |title=Why captchas are getting harder |url=https://www.vox.com/22436832/captchas-getting-harder-ai-artificial-intelligence |access-date=2022-04-15 |website=Vox |language=en |archive-date=April 15, 2022 |archive-url=https://web.archive.org/web/20220415041740/https://www.vox.com/22436832/captchas-getting-harder-ai-artificial-intelligence |url-status=live }}</ref> | |||
{{refs|30em}} | |||
Google charges for the use of reCAPTCHA on websites that make over a million reCAPTCHA queries a month.<ref name="d"/> | |||
==External links== | |||
* | |||
* | |||
* Two-page article in '']'' magazine | |||
* Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham and Manuel Blum. 2008. "CAPTCHA: Human-Based Character Recognition via Web Security Measures" ''Science'' 12 September 2008: Vol. 321 no. 5895 pp. 1465-1468. http://dx.doi.org/10.1126/science.1160379 | |||
* , YouTube video of the "TEDtalksDirector" channel, uploaded 2011-12-06. | |||
* {{TED|speakers/luis_von_ahn.html|Luis von Ahn}} | |||
reCAPTCHA v1 was declared ] and shut down on March 31, 2018.<ref>{{Cite web |title=Google reCAPTCHA v1 API Shutting Down in March 2018 |url=https://www.programmableweb.com/news/google-recaptcha-v1-api-shutting-down-march-2018/brief/2017/10/24 |url-status=live |archive-url=https://web.archive.org/web/20200620111320/https://www.programmableweb.com/news/google-recaptcha-v1-api-shutting-down-march-2018/brief/2017/10/24 |archive-date=2020-06-20 |access-date=2020-04-14 |website=ProgrammableWeb |language=en}}</ref> | |||
=== reCAPTCHA v2 (checkbox) === | |||
] | |||
In 2013, reCAPTCHA began implementing ] of the browser's interactions to predict whether the user was a human or a bot. The following year, Google began to deploy a new reCAPTCHA API, featuring the "no CAPTCHA reCAPTCHA"—where users deemed to be of low risk only need to click a single ] to verify their identity. A CAPTCHA may still be presented if the system is uncertain of the user's risk; Google also introduced a new type of CAPTCHA challenge designed to be more accessible to mobile users, where the user must select images matching a specific prompt from a grid.<ref name="No CAPTCHA" /><ref name="oneclick">{{cite magazine |last=Greenberg |first=Andy |date=3 December 2014 |title=Google Can Now Tell You're Not a Robot with Just One Click |url=https://www.wired.com/2014/12/google-one-click-recaptcha/ |magazine=] |access-date=1 October 2015 |archive-date=2 October 2015 |archive-url=https://web.archive.org/web/20151002161845/http://www.wired.com/2014/12/google-one-click-recaptcha/ |url-status=live}}</ref> | |||
=== reCAPTCHA v3 and reCAPTCHA Enterprise (invisible) === | |||
{{DEFAULTSORT:Recaptcha}} | |||
In 2017, Google introduced a new "invisible" reCAPTCHA, where verification occurs in the background, and no challenges are displayed at all if the user is deemed to be of low risk.<ref name="Fast Company" /><ref>{{Cite web |url=https://arstechnica.com/gadgets/2017/03/googles-recaptcha-announces-invisible-background-captchas/ |title=Google's reCAPTCHA turns 'invisible,' will separate bots from people without challenges |last=Amadeo |first=Ron |date=2017-03-09 |website=Ars Technica |language=en-us |access-date=2020-04-14 |archive-date=2020-08-06|archive-url=https://web.archive.org/web/20200806153852/https://arstechnica.com/gadgets/2017/03/googles-recaptcha-announces-invisible-background-captchas/ |url-status=live}}</ref><ref name=":1" /> According to former Google "] czar" ], this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."<ref name=":1">{{cite magazine |url=http://www.popsci.com/google-invisible-recaptcha#page-3 |title=Google just made the internet a tiny bit less annoying |magazine=] |date=10 March 2017 |access-date=2017-04-05 |archive-date=5 February 2021 |archive-url=https://web.archive.org/web/20210205012217/https://www.popsci.com/google-invisible-recaptcha/#page-3 |url-status=live }}</ref> | |||
] | |||
] | |||
== Implementation == | |||
] | |||
The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a ] ] with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment,<ref name="FAQ" /> but the reCAPTCHA software is not ].<ref name="google">{{cite web|url=http://www.google.com/recaptcha|title=reCAPTCHA: Stop Spam, Read Books|access-date=2014-01-14|archive-date=June 19, 2020|archive-url=https://web.archive.org/web/20200619191856/https://www.google.com/recaptcha/|url-status=live}}</ref> | |||
Also, reCAPTCHA offers plugins for several web-application platforms including ], ], and ], to ease the implementation of the service.<ref name="google2">{{cite web|url=https://developers.google.com/recaptcha/intro?csw=1|title=Developer's Guide—reCAPTCHA |publisher=Google Inc.|access-date=2014-01-14|archive-date=November 24, 2017|archive-url=https://web.archive.org/web/20171124231704/https://developers.google.com/recaptcha/intro?csw=1|url-status=live}}</ref> | |||
== Security == | |||
]|access-date=2017-09-10 |archive-date=2017-09-09 |archive-url=https://web.archive.org/web/20170909171413/https://www.forbes.com/sites/firewall/2010/06/18/those-scrambled-word-tests-for-stopping-spambots-are-tough-for-humans-too/ |url-status=live}}</ref> containing the words "and chisels"]] | |||
The main purpose of a ] system is to block spambots while allowing human users. On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed bots to achieve a solve rate of 18%.<ref name="Strong CAPTCHA Guidelines" /><ref name="Register Article" /><ref name="H-online Article">{{cite web |url=http://www.h-online.com/security/news/item/Google-s-reCAPTCHA-dented-888859.html |title=Google's reCAPTCHA dented |access-date=31 January 2011 |archive-date=10 March 2010 |archive-url=https://web.archive.org/web/20100310082513/http://www.h-online.com/security/news/item/Google-s-reCAPTCHA-dented-888859.html |url-status=live }}</ref> | |||
On August 1, 2010, Chad Houck gave a presentation to the ] 18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time.<ref name="Speaker Program" /><ref name="Decoding reCAPTCHA" /> The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high-security lockout if an invalid response is given 32 times in a row.<ref name="Decoding reCAPTCHA pptx" /> | |||
On May 26, 2012, Adam, C-P, and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%.<ref name="Project Stiltwalker" /> Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyze the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked. | |||
On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published a paper showing a system running on reCAPTCHA images with an accuracy of 82%.<ref>{{Citation |last=Cruz-Perez |first=Claudia |title=Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition |date=2012-06-27 |work=Pattern Recognition |volume=7329 |pages=155{{ndash}}165 |editor-last=Carrasco-Ochoa |editor-first=Jesús Ariel |url=http://link.springer.com/10.1007/978-3-642-31149-9_16 |access-date=2013-01-23 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |doi=10.1007/978-3-642-31149-9_16 |isbn=978-3-642-31148-2 |s2cid=29097170 |last2=Starostenko |first2=Oleg |last3=Uceda-Ponga |first3=Fernando |last4=Alarcon-Aquino |first4=Vicente |last5=Reyes-Cabrera |first5=Leobardo |editor2-last=Martínez-Trinidad |editor2-first=José Francisco |editor3-last=Olvera López |editor3-first=José Arturo |editor4-last=Boyer |editor4-first=Kim L.}}</ref> The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to be ] and robust to some, if not all changes in the image database. | |||
In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans"—they were not able to solve them manually either.<ref name="Project Stiltwalker" /> The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents find CAPTCHA to be very or somewhat difficult".<ref name="webAIM">{{cite web |url=http://webaim.org/projects/screenreadersurvey4/#captcha/ |title=Screen Reader User Survey #4 Results |access-date=19 April 2013 |archive-date=10 December 2017 |archive-url=https://web.archive.org/web/20171210071614/https://webaim.org/projects/screenreadersurvey4/#captcha/ |url-status=live}}</ref> | |||
== Criticism == | |||
The original iteration of reCAPTCHA was criticized as being a source of ] to assist in transcribing efforts.<ref>{{cite web|url=http://www.bizjournals.com/boston/blog/techflash/2015/01/massachusetts-womans-lawsuit-accuses-google-of.html|title=Massachusetts woman's lawsuit accuses Google of using free labor to transcribe books, newspapers|last=Harris|first=David L.|date=Jan 23, 2015|website=Boston Business Journal|access-date=September 4, 2015|archive-date=April 28, 2015|archive-url=https://web.archive.org/web/20150428030942/http://www.bizjournals.com/boston/blog/techflash/2015/01/massachusetts-womans-lawsuit-accuses-google-of.html|url-status=live}}</ref> | |||
Google profits from reCAPTCHA users as free workers to improve its AI research.<ref>{{Cite web |title=No CAPTCHA: yet another ruse devised by Google to extract free digital labor from you |url=https://www.casilli.fr/2014/12/05/no-captcha-is-google-jargon-for-mechanical-turk-for-free/ |access-date=December 3, 2020 |archive-date=November 12, 2020 |archive-url=https://web.archive.org/web/20201112024721/http://www.casilli.fr/2014/12/05/no-captcha-is-google-jargon-for-mechanical-turk-for-free/ |url-status=live }}</ref> | |||
=== Privacy === | |||
The current iteration of the system has been criticized for its reliance on ] and promotion of ] with Google services; administrators are encouraged to include reCAPTCHA tracking code on all pages of their website to analyze the behavior and "risk" of users, which determines the level of friction presented when a reCAPTCHA prompt is used.<ref>{{Cite web |last=Taylor |first=Chris |date=February 26, 2024 |title=Stop giving your website data away! |url=https://prosopo.io/articles/stop-giving-your-website-data-away/ |website=Prosopo}}</ref> Google stated in its ] that user data collected in this manner is not used for personalized advertising. It was also discovered that the system favors those who have an active ] login, and displays a higher risk towards those using anonymizing proxies and VPN services.<ref name="Fast Company">{{Cite web|url=https://www.fastcompany.com/90369697/googles-new-recaptcha-has-a-dark-side|title=Google's new reCAPTCHA has a dark side|last=Schwab|first=Katharine|date=2019-06-27|website=Fast Company|language=en-US|access-date=2020-04-08|archive-date=June 28, 2019|archive-url=https://web.archive.org/web/20190628221142/https://www.fastcompany.com/90369697/googles-new-recaptcha-has-a-dark-side|url-status=live}}</ref> | |||
Concerns were raised regarding privacy when Google announced reCAPTCHA v3.0, as it allows Google to track users on non-Google websites.<ref name="Fast Company" /> | |||
In April 2020, ] switched from reCAPTCHA to <!--]-->, citing privacy concerns over Google's potential use of the data they recollect through reCAPTCHA for ]<ref>{{Cite web |date=2020-04-08 |title=Moving from reCAPTCHA to hCaptcha |url=https://blog.cloudflare.com/moving-from-recaptcha-to-hcaptcha/ |access-date=2020-07-18 |website=The Cloudflare Blog |language=en |archive-date=August 12, 2020 |archive-url=https://web.archive.org/web/20200812233420/https://blog.cloudflare.com/moving-from-recaptcha-to-hcaptcha/ |url-status=live }}</ref> and to cut down on operating costs since a considerable portion of Cloudflare's customers are non-paying customers. In response, Google told '']'' that the data from reCAPTCHA is never used for personalized advertising purposes.<ref name="d">{{Cite web |title=Cloudflare Dumps Google's ReCAPTCHA Over Privacy Concerns, Costs |url=https://www.pcmag.com/news/cloudflare-dumps-googles-recaptcha-over-privacy-concerns-costs |access-date=2020-07-18 |website=] |language=en |archive-date=July 19, 2020 |archive-url=https://web.archive.org/web/20200719214415/https://www.pcmag.com/news/cloudflare-dumps-googles-recaptcha-over-privacy-concerns-costs |url-status=live }}</ref> | |||
=== Accessibility === | |||
Google's help center states that reCAPTCHA is not ] for the ] community,<ref>{{Cite web |url=https://support.google.com/a/answer/1217728 |title=What is CAPTCHA? - G Suite Admin Help |access-date=May 11, 2020 |archive-date=August 6, 2020 |archive-url=https://web.archive.org/web/20200806173938/https://support.google.com/a/answer/1217728 |url-status=live }}</ref> effectively locking such users out of all pages that use the service. However, reCAPTCHA does currently have the longest list of accessibility considerations of any CAPTCHA service.<ref>{{Cite web |url=https://blog.teamtreehouse.com/wcag-1-1-text-alternatives |title=WCAG 1.1: Text Alternatives [Article] |date=October 6, 2020 |access-date=December 10, 2020 |archive-date=November 26, 2020 |archive-url=https://web.archive.org/web/20201126104424/https://blog.teamtreehouse.com/wcag-1-1-text-alternatives |url-status=live }}</ref> | |||
=== Interface === | |||
In one of the variants of CAPTCHA challenges, images are not incrementally highlighted, but fade out when clicked, and replaced with a new image fading in, resembling ]. | |||
Criticism has been aimed at the long duration taken for the images to fade out and in.<ref name=GitHub-fading>{{cite web |title=ReCaptcha extremly [sic] slow fading · Issue #268 · google/recaptcha |url=https://github.com/google/recaptcha/issues/268 |website=GitHub |language=en |access-date=October 14, 2020 |archive-date=October 14, 2020 |archive-url=https://web.archive.org/web/20201014144316/https://github.com/google/recaptcha/issues/268 |url-status=live }}</ref> | |||
== Derivative projects == | |||
reCAPTCHA also created the Mailhide project, which protects ]es on web pages from being ] by ].<ref name="Mailhide" /> By default, the email address was converted into a format that did not allow a ] to see the full email address; for example, "mailme@example.com" would have been converted to "mai...@example.com". The visitor would then click on the "..." and solve the CAPTCHA to obtain the full email address. One could also edit the pop-up code so that none of the addresses were visible. Mailhide was discontinued in 2018 because it relied on reCAPTCHA v1.<ref name="MailhideDiscontinued" /> | |||
== References == | |||
{{reflist|refs= | |||
<ref name="AutoK4-1">{{cite web |url=http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html |title=Teaching computers to read: Google acquires reCAPTCHA |access-date=2009-09-16 |archive-date=May 19, 2013 |archive-url=https://web.archive.org/web/20130519014809/http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html |url-status=live }}</ref> | |||
<ref name="AutoK4-2">{{cite web|url=http://www.google.com/recaptcha/faq|title=reCAPTCHA FAQ|access-date=2011-06-12|archive-date=July 5, 2010|archive-url=https://web.archive.org/web/20100705103425/http://www.google.com/recaptcha/faq|url-status=live}}</ref> | |||
<ref name="BBCreport">{{cite news |url=http://news.bbc.co.uk/2/hi/technology/7023627.stm |title=Spam weapon helps preserve books |publisher=BBC |first=Paul |last=Rubens |date=October 2, 2007 |access-date=October 3, 2007 |archive-date=May 18, 2013 |archive-url=https://web.archive.org/web/20130518150746/http://news.bbc.co.uk/2/hi/technology/7023627.stm |url-status=live }}</ref> | |||
<ref name="craig">{{cite web |url=http://blog.craigslist.org/2008/06/fight-spam-digitize-books/ |title=Fight Spam, Digitize Books |publisher=Craigslist Blog |date=June 2008 |access-date=June 17, 2008 |archive-date=July 6, 2010 |archive-url=https://web.archive.org/web/20100706020614/http://blog.craigslist.org/2008/06/fight-spam-digitize-books/ |url-status=live }}</ref> | |||
<ref name="AutoK4-5">{{cite web|url=https://www.dtv2009.gov/|title=TV Converter Box Program|website=dtv2009.gov|url-status=dead|archive-url=https://web.archive.org/web/20091104004349/https://www.dtv2009.gov/|archive-date=November 4, 2009}}</ref> | |||
<ref name="CBC2">{{cite web |url=http://www.cbc.ca/spark/2011/11/full-interview-luis-von-ahn-on-duolingo/ |title="Full Interview: Luis von Ahn on Duolingo", Spark, November 2011 |publisher=Canadian Broadcasting Corporation |date=November 30, 2011 |access-date=2013-07-10 |archive-date=June 3, 2012 |archive-url=https://web.archive.org/web/20120603142110/http://www.cbc.ca/spark/2011/11/full-interview-luis-von-ahn-on-duolingo/ |url-status=live }}</ref> | |||
<ref name ="AutoK4-9">{{cite web|last1=Hutchinson|first1=Alex|title=Human Resources: The job you didn't even know you had|url=http://thewalrus.ca/human-resources/|website=The Walrus|access-date=December 7, 2015|date=2009-03-12|archive-date=December 3, 2015|archive-url=https://web.archive.org/web/20151203192933/http://thewalrus.ca/human-resources/|url-status=live}}</ref> | |||
<ref name="AutoK4-8">{{cite web | first=John | last=Timmer | title=CAPTCHAs work? for digitizing old, damaged texts, manuscripts | date=August 14, 2008 | website=Ars Technica | url=https://arstechnica.com/news.ars/post/20080814-captchas-workfor-digitizing-old-damaged-texts-manuscripts.html | access-date=2008-12-09 | archive-date=January 24, 2009 | archive-url=https://web.archive.org/web/20090124143328/http://arstechnica.com/news.ars/post/20080814-captchas-workfor-digitizing-old-damaged-texts-manuscripts.html | url-status=live }}</ref> | |||
<ref name="DM">{{cite web |url=http://groups.google.com/group/recaptcha/browse_thread/thread/c53efadf7ac89fd2 |title="questionable validity of results if words are presented out of context", Google Groups, August 29, 2008 |access-date=2013-07-10 |archive-date=April 30, 2011 |archive-url=https://web.archive.org/web/20110430020243/http://groups.google.com/group/recaptcha/browse_thread/thread/c53efadf7ac89fd2 |url-status=live }}</ref> | |||
<ref name="FAQ">{{cite web |url=http://recaptcha.net/faq.html |title=FAQ |publisher=reCAPTCHA.net |url-status=dead |archive-url=https://archive.today/20120716142737/http://recaptcha.net/faq.html |archive-date=July 16, 2012 }}</ref> | |||
<ref name="Strong CAPTCHA Guidelines">{{cite web |url=http://bitland.net/captcha.pdf |title=Strong CAPTCHA Guidelines |access-date=January 31, 2011 |archive-date=July 23, 2011 |archive-url=https://web.archive.org/web/20110723025202/http://bitland.net/captcha.pdf |url-status=live }}</ref> | |||
<ref name="Register Article">{{cite web |url=https://www.theregister.co.uk/2009/12/14/google_recaptcha_busted/ |title=Google's reCAPTCHA busted by new attack |website=] |access-date=August 10, 2017 |archive-date=August 10, 2017 |archive-url=https://web.archive.org/web/20170810135405/https://www.theregister.co.uk/2009/12/14/google_recaptcha_busted/ |url-status=live }}</ref> | |||
<ref name="Speaker Program">{{cite web |url=http://www.defcon.org/html/defcon-18/dc-18-speakers.html#Houck |title=Def Con 18 Speakers |publisher=defcon.org |access-date=November 17, 2010 |archive-date=October 20, 2010 |archive-url=https://web.archive.org/web/20101020231800/http://defcon.org/html/defcon-18/dc-18-speakers.html#Houck |url-status=live }}</ref> | |||
<ref name="Decoding reCAPTCHA">{{cite web |url=http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.docx |title=Decoding reCAPTCHA Paper |publisher=Chad Houck |url-status=dead |archive-url=https://web.archive.org/web/20100819053439/http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.docx |archive-date=August 19, 2010 }}</ref> | |||
<ref name="Decoding reCAPTCHA pptx">{{cite web |url=http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.pptx |title=Decoding reCAPTCHA Power Point |publisher=Chad Houck |url-status=dead |archive-url=https://web.archive.org/web/20101024210642/http://n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.pptx |archive-date=October 24, 2010 }}</ref> | |||
<ref name="Project Stiltwalker">{{cite web|url=http://www.dc949.org/projects/stiltwalker/|title=Project Stiltwalker|access-date=May 28, 2012|archive-date=July 2, 2012|archive-url=https://web.archive.org/web/20120702143240/http://www.dc949.org/projects/stiltwalker/|url-status=live}}</ref> | |||
<ref name="Mailhide">{{cite web |url=http://www.google.com/recaptcha/mailhide/ |title=Mailhide: Free Spam Protection |access-date=May 15, 2011 |archive-date=January 2, 2012 |archive-url=https://web.archive.org/web/20120102071715/http://www.google.com/recaptcha/mailhide/ |url-status=live }}</ref> | |||
<ref name="MailhideDiscontinued">{{cite web |url=https://groups.google.com/forum/#!topicsearchin/recaptcha/mailhide/recaptcha/CxzVXBI6G84 |title=Mailhide: Service discontinued |access-date=March 3, 2019 |archive-date=November 7, 2012 |archive-url=https://web.archive.org/web/20121107005815/http://groups.google.com/group/sci.virtual-worlds/msg/04d8c4cfc77154b4?dmode=source&output=gplain#!topicsearchin/recaptcha/mailhide/recaptcha/CxzVXBI6G84 |url-status=live }}</ref> | |||
}} | |||
== Further reading == | |||
* {{Cite web |last1=Dzieza |first1=Josh |title=Why CAPTCHAs have gotten so difficult |work=] |date=2019-02-01 |url=https://www.theverge.com/2019/2/1/18205610/google-captcha-ai-robot-human-difficult-artificial-intelligence |df=mdy-all }} | |||
* {{cite web |first1=Katharine|last1=Schwab|title=Google's new reCAPTCHA has a dark side|url=https://www.fastcompany.com/90369697/googles-new-recaptcha-has-a-dark-side|date=27 June 2019|website=]}} | |||
== External links == | |||
{{Commons category}} | |||
* {{Official website}} | |||
{{Google LLC}} | |||
] | |||
] | |||
] | |||
] | |||
] | ] | ||
] | |||
] | ] | ||
] | ] | ||
] | |||
] | ] | ||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] | |||
] |
Latest revision as of 04:54, 7 January 2025
CAPTCHA implementation owned by Google This article is about a specific implementation of a CAPTCHA. For the original test, see CAPTCHA.
Original author(s) |
|
---|---|
Developer(s) | |
Initial release | May 27, 2007; 17 years ago (2007-05-27) |
Type | Classic version: CAPTCHA New version: Behavioral analysis |
Website | google |
reCAPTCHA Inc. is a CAPTCHA system owned by Google. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically. Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.
The original iteration of the service was a mass collaboration platform designed for the digitization of books, particularly those that were too illegible to be scanned by computers. The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used to crowdsource the reading of an uncertain word. reCAPTCHA was originally developed by Luis von Ahn, David Abraham, Manuel Blum, Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan at Carnegie Mellon University's main Pittsburgh campus. It was acquired by Google in September 2009. The system helped to digitize the archives of The New York Times, and was subsequently used by Google Books for similar purposes.
The system was reported as displaying over 100 million CAPTCHAs every day, on sites such as Facebook, TicketMaster, Twitter, 4chan, CNN.com, StumbleUpon, Craigslist (since June 2008), and the U.S. National Telecommunications and Information Administration's digital TV converter box coupon program website (as part of the US DTV transition).
In 2014, Google pivoted the service away from its original concept, with a focus on reducing the amount of user interaction needed to verify a user, and only presenting human recognition challenges (such as identifying images in a set that satisfy a specific prompt) if behavioral analysis suspects that the user may be a bot.
In October 2023, it was found that OpenAI's GPT-4 chatbot could solve CAPTCHAs.
Origin
Distributed Proofreaders was the first project to volunteer its time to decipher scanned text that could not be read by optical character recognition (OCR) programs. It works with Project Gutenberg to digitize public domain material and uses methods quite different from reCAPTCHA.
The reCAPTCHA program originated with Guatemalan computer scientist Luis von Ahn, and was aided by a MacArthur Fellowship. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".
Operation
reCAPTCHA v1 (human-assisted OCR)
Scanned text is subjected to analysis by two different OCRs. Any word that is deciphered differently by the two OCR programs or that is not in an English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words. If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word. When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable.
The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document. Also, the control word might mislead the context for the second word, such as a request of "/metal/ /fife/" being entered as "metal file" due to the logical connection of filing with a metal tool being considered more common than the musical instrument "fife".
In 2012, reCAPTCHA began using photographs taken from Google Street View project, in addition to scanned words. It will ask the user to identify images of crosswalks, street lights, and other objects. It has been hypothesized that the data is used by Waymo (a Google subsidiary) to train autonomous vehicles, though an unnamed representative has denied this, claiming the data was only being used to improve Google Maps as of mid-2021.
Google charges for the use of reCAPTCHA on websites that make over a million reCAPTCHA queries a month.
reCAPTCHA v1 was declared end-of-life and shut down on March 31, 2018.
reCAPTCHA v2 (checkbox)
In 2013, reCAPTCHA began implementing behavioral analysis of the browser's interactions to predict whether the user was a human or a bot. The following year, Google began to deploy a new reCAPTCHA API, featuring the "no CAPTCHA reCAPTCHA"—where users deemed to be of low risk only need to click a single checkbox to verify their identity. A CAPTCHA may still be presented if the system is uncertain of the user's risk; Google also introduced a new type of CAPTCHA challenge designed to be more accessible to mobile users, where the user must select images matching a specific prompt from a grid.
reCAPTCHA v3 and reCAPTCHA Enterprise (invisible)
In 2017, Google introduced a new "invisible" reCAPTCHA, where verification occurs in the background, and no challenges are displayed at all if the user is deemed to be of low risk. According to former Google "click fraud czar" Shuman Ghosemajumder, this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."
Implementation
The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a JavaScript API with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment, but the reCAPTCHA software is not open-source.
Also, reCAPTCHA offers plugins for several web-application platforms including ASP.NET, Ruby, and PHP, to ease the implementation of the service.
Security
The main purpose of a CAPTCHA system is to block spambots while allowing human users. On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed bots to achieve a solve rate of 18%.
On August 1, 2010, Chad Houck gave a presentation to the DEF CON 18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time. The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high-security lockout if an invalid response is given 32 times in a row.
On May 26, 2012, Adam, C-P, and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%. Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyze the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked.
On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published a paper showing a system running on reCAPTCHA images with an accuracy of 82%. The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to be intelligent OCR and robust to some, if not all changes in the image database.
In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans"—they were not able to solve them manually either. The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents find CAPTCHA to be very or somewhat difficult".
Criticism
The original iteration of reCAPTCHA was criticized as being a source of unpaid work to assist in transcribing efforts.
Google profits from reCAPTCHA users as free workers to improve its AI research.
Privacy
The current iteration of the system has been criticized for its reliance on tracking cookies and promotion of vendor lock-in with Google services; administrators are encouraged to include reCAPTCHA tracking code on all pages of their website to analyze the behavior and "risk" of users, which determines the level of friction presented when a reCAPTCHA prompt is used. Google stated in its privacy policy that user data collected in this manner is not used for personalized advertising. It was also discovered that the system favors those who have an active Google account login, and displays a higher risk towards those using anonymizing proxies and VPN services.
Concerns were raised regarding privacy when Google announced reCAPTCHA v3.0, as it allows Google to track users on non-Google websites.
In April 2020, Cloudflare switched from reCAPTCHA to hCaptcha, citing privacy concerns over Google's potential use of the data they recollect through reCAPTCHA for targeted advertising and to cut down on operating costs since a considerable portion of Cloudflare's customers are non-paying customers. In response, Google told PC Magazine that the data from reCAPTCHA is never used for personalized advertising purposes.
Accessibility
Google's help center states that reCAPTCHA is not supported for the deafblind community, effectively locking such users out of all pages that use the service. However, reCAPTCHA does currently have the longest list of accessibility considerations of any CAPTCHA service.
Interface
In one of the variants of CAPTCHA challenges, images are not incrementally highlighted, but fade out when clicked, and replaced with a new image fading in, resembling whack-a-mole.
Criticism has been aimed at the long duration taken for the images to fade out and in.
Derivative projects
reCAPTCHA also created the Mailhide project, which protects email addresses on web pages from being harvested by spammers. By default, the email address was converted into a format that did not allow a crawler to see the full email address; for example, "mailme@example.com" would have been converted to "mai...@example.com". The visitor would then click on the "..." and solve the CAPTCHA to obtain the full email address. One could also edit the pop-up code so that none of the addresses were visible. Mailhide was discontinued in 2018 because it relied on reCAPTCHA v1.
References
- "Recaptcha Inc". OpenCorporates. August 28, 2007. Archived from the original on August 20, 2023. Retrieved August 20, 2023.
- ^ Shet, Vinay (December 3, 2014). "Are you a robot? Introducing 'CAPTCHA the ReCAPTCHA PREDATORS". Archived from the original on September 3, 2020. Retrieved February 24, 2021.
- "reCAPTCHA v3". Archived from the original on September 25, 2020. Retrieved September 8, 2020.
- Ahn, Luis von (December 6, 2011), Massive-scale online collaboration, archived from the original on July 15, 2020, retrieved April 14, 2020
- "reCAPTCHA: About Us". Archived from the original on June 11, 2010. Retrieved August 14, 2018.
- "Teaching computers to read: Google acquires reCAPTCHA". Archived from the original on May 19, 2013. Retrieved September 16, 2009.
- "Deciphering Old Texts, One Woozy, Curvy Word at a Time". The New York Times. March 28, 2011. Archived from the original on November 17, 2017. Retrieved November 20, 2017.
- "reCAPTCHA FAQ". Archived from the original on July 5, 2010. Retrieved June 12, 2011.
- Rubens, Paul (October 2, 2007). "Spam weapon helps preserve books". BBC. Archived from the original on May 18, 2013. Retrieved October 3, 2007.
- "Fight Spam, Digitize Books". Craigslist Blog. June 2008. Archived from the original on July 6, 2010. Retrieved June 17, 2008.
- "TV Converter Box Program". dtv2009.gov. Archived from the original on November 4, 2009.
- Edwards, Benj (October 2, 2023). "Dead grandma locket request tricks Bing Chat's AI into solving security puzzle". Ars Technica. Archived from the original on October 10, 2023. Retrieved October 25, 2023.
- ""Full Interview: Luis von Ahn on Duolingo", Spark, November 2011". Canadian Broadcasting Corporation. November 30, 2011. Archived from the original on June 3, 2012. Retrieved July 10, 2013.
- Hutchinson, Alex (March 12, 2009). "Human Resources: The job you didn't even know you had". The Walrus. Archived from the original on December 3, 2015. Retrieved December 7, 2015.
- Timmer, John (August 14, 2008). "CAPTCHAs work? for digitizing old, damaged texts, manuscripts". Ars Technica. Archived from the original on January 24, 2009. Retrieved December 9, 2008.
- ^ Luis; Maurer, Ben; McMillen, Colin; Abraham, David; Blum, Manuel (2008). "reCAPTCHA: Human-Based Character Recognition via Web Security Measures"". Science. 321 (5895): 1465–1468. Bibcode:2008Sci...321.1465V. CiteSeerX 10.1.1.141.6563. doi:10.1126/science.1160379. PMID 18703711. S2CID 18371056.
- ""questionable validity of results if words are presented out of context", Google Groups, August 29, 2008". Archived from the original on April 30, 2011. Retrieved July 10, 2013.
- Perez, Sarah (March 29, 2012). "Google Now Using ReCAPTCHA To Decode Street View Addresses". TechCrunch. Archived from the original on August 23, 2012. Retrieved July 10, 2013.
- Vega, Edward (May 14, 2021). "Why captchas are getting harder". Vox. Archived from the original on April 15, 2022. Retrieved April 15, 2022.
- ^ "Cloudflare Dumps Google's ReCAPTCHA Over Privacy Concerns, Costs". PCMag. Archived from the original on July 19, 2020. Retrieved July 18, 2020.
- "Google reCAPTCHA v1 API Shutting Down in March 2018". ProgrammableWeb. Archived from the original on June 20, 2020. Retrieved April 14, 2020.
- Greenberg, Andy (December 3, 2014). "Google Can Now Tell You're Not a Robot with Just One Click". Wired. Archived from the original on October 2, 2015. Retrieved October 1, 2015.
- ^ Schwab, Katharine (June 27, 2019). "Google's new reCAPTCHA has a dark side". Fast Company. Archived from the original on June 28, 2019. Retrieved April 8, 2020.
- Amadeo, Ron (March 9, 2017). "Google's reCAPTCHA turns 'invisible,' will separate bots from people without challenges". Ars Technica. Archived from the original on August 6, 2020. Retrieved April 14, 2020.
- ^ "Google just made the internet a tiny bit less annoying". Popular Science. March 10, 2017. Archived from the original on February 5, 2021. Retrieved April 5, 2017.
- "FAQ". reCAPTCHA.net. Archived from the original on July 16, 2012.
- "reCAPTCHA: Stop Spam, Read Books". Archived from the original on June 19, 2020. Retrieved January 14, 2014.
- "Developer's Guide—reCAPTCHA". Google Inc. Archived from the original on November 24, 2017. Retrieved January 14, 2014.
- Greenberg, Andy (June 18, 2010). "Those Scrambled Word Tests For Stopping Spambots Are Tough For Humans Too". Forbes. Archived from the original on September 9, 2017. Retrieved September 10, 2017.
- "Strong CAPTCHA Guidelines" (PDF). Archived (PDF) from the original on July 23, 2011. Retrieved January 31, 2011.
- "Google's reCAPTCHA busted by new attack". The Register. Archived from the original on August 10, 2017. Retrieved August 10, 2017.
- "Google's reCAPTCHA dented". Archived from the original on March 10, 2010. Retrieved January 31, 2011.
- "Def Con 18 Speakers". defcon.org. Archived from the original on October 20, 2010. Retrieved November 17, 2010.
- "Decoding reCAPTCHA Paper". Chad Houck. Archived from the original on August 19, 2010.
- "Decoding reCAPTCHA Power Point". Chad Houck. Archived from the original on October 24, 2010.
- ^ "Project Stiltwalker". Archived from the original on July 2, 2012. Retrieved May 28, 2012.
- Cruz-Perez, Claudia; Starostenko, Oleg; Uceda-Ponga, Fernando; Alarcon-Aquino, Vicente; Reyes-Cabrera, Leobardo (June 27, 2012), Carrasco-Ochoa, Jesús Ariel; Martínez-Trinidad, José Francisco; Olvera López, José Arturo; Boyer, Kim L. (eds.), "Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition", Pattern Recognition, vol. 7329, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 155–165, doi:10.1007/978-3-642-31149-9_16, ISBN 978-3-642-31148-2, S2CID 29097170, retrieved January 23, 2013
- "Screen Reader User Survey #4 Results". Archived from the original on December 10, 2017. Retrieved April 19, 2013.
- Harris, David L. (January 23, 2015). "Massachusetts woman's lawsuit accuses Google of using free labor to transcribe books, newspapers". Boston Business Journal. Archived from the original on April 28, 2015. Retrieved September 4, 2015.
- "No CAPTCHA: yet another ruse devised by Google to extract free digital labor from you". Archived from the original on November 12, 2020. Retrieved December 3, 2020.
- Taylor, Chris (February 26, 2024). "Stop giving your website data away!". Prosopo.
- "Moving from reCAPTCHA to hCaptcha". The Cloudflare Blog. April 8, 2020. Archived from the original on August 12, 2020. Retrieved July 18, 2020.
- "What is CAPTCHA? - G Suite Admin Help". Archived from the original on August 6, 2020. Retrieved May 11, 2020.
- "WCAG 1.1: Text Alternatives [Article]". October 6, 2020. Archived from the original on November 26, 2020. Retrieved December 10, 2020.
- "ReCaptcha extremly [sic] slow fading · Issue #268 · google/recaptcha". GitHub. Archived from the original on October 14, 2020. Retrieved October 14, 2020.
- "Mailhide: Free Spam Protection". Archived from the original on January 2, 2012. Retrieved May 15, 2011.
- "Mailhide: Service discontinued". Archived from the original on November 7, 2012. Retrieved March 3, 2019.
Further reading
- Dzieza, Josh (February 1, 2019). "Why CAPTCHAs have gotten so difficult". The Verge.
- Schwab, Katharine (June 27, 2019). "Google's new reCAPTCHA has a dark side". Fast Company.