A Whole Lotta Nothing Matt Haughey’s Personal Blog

Posted
27 March 2008 @ 9am

Tagged
ask the readers, flickr

ReCAPTCHA’s quality is going down?

Several months ago, we implemented ReCAPTCHA on MetaFilter contact forms, to thwart spammers. It’s a good cause and a great idea: the nonsensical text you decode ends up helping public domain book scanning projects.

But lately, we’ve been getting a steady stream of complaints that it is not working or is unsolvable. Last night I tried out the contact form and was surprised that in the first ten images presented to me (keep hitting the little refresh button, the top of the three buttons on the control), at least half were totally undecipherable.

Here’s an actual screenshot of one I saw this morning. The first word is impossible to decipher. My question is, has ReCAPTCHA had such success that all we’re left with is the really, really bad book scans?


17 Comments

Posted by
MissPinkKate
27 March 2008 @ 11am

I often have a hard time reading those things.


Posted by
Peter
27 March 2008 @ 11am

I don’t know if they’re using reCAPTCHA, but I noticed the same problem on TicketMaster.ca a few days ago. I kept refreshing until I finally got one I could reasonably guess at.


Posted by
Jordan Running
27 March 2008 @ 12pm

I noticed the same thing a couple days ago. I had to press the reload button two or three times to get one I could decipher. I hope your conjecture about its success is correct, but I also hope they find away to make it more usable again.


Posted by
Steven Garrity
27 March 2008 @ 1pm

We actually got a new kind of spam that seemed to be semi-manually generated and came to the site *because* we implemented the ReCaptcha.


Posted by
Eric
27 March 2008 @ 4pm

We have had a similar issue on StumbleUpon, so we added a prominent link that says “Can’t read this?” which calls Recaptcha.reload() to fetch a new image.

From what I’ve been told from the folks at reCAPTCHA, they’re working on fixing the rendering to be more legible.

Cheers,
Eric Goldberg
StumbleUpon Dev


Posted by
Eric
27 March 2008 @ 4pm

Also note that users need only answer one of the two words correctly. The more decipherable one is probably the known one. It might help to add text explaining this.

Also, I would suggest using the “custom” theme, since it gives you more lateral freedom to add your own text and look/feel.

Cheers,
Eric


Posted by
Aaron Suggs
27 March 2008 @ 4pm

I’ve only used recaptcha for about a month, so I can’t speak to a trend. But I’m usually able to tell which of the two words is known, and which word recaptcha is learning.

Earlier today recaptcha presented me with what was obviously a greek “pi” character (π)…I didn’t know what to say.


Posted by
Tom
28 March 2008 @ 4am

The first word is wall. Try guessing what an idiot would guess the word to be!


Posted by
Phil
28 March 2008 @ 4am

No, the first word is clearly “will”!


Posted by
SZoPer
28 March 2008 @ 4am

My guess would be “will” for the first one, so it’s not sooooo difficult ;) But I agree, that such situation happens often lately.


Posted by
Patrick
28 March 2008 @ 5am

People usually miss one of the most important aspects of recaptcha — it doesn’t know what one of the words is. So, if recaptcha presents two words, and one is very clear and easy to decipher while the other is gobbledygook, then you can enter gibberish for the undecipherable word and still make it past. In the example above you can enter “decorated iAmARutabega” and it would still register as a valid entry.

Such actions shouldn’t hurt the recaptcha project because it requires a consensus of words to decipher and perform human OCR.


Posted by
Cameron Barrett
28 March 2008 @ 6am

There was a recent report that Google’s CAPTCHA system was cracked by spammers but now those reports are being questioned by Google, saying the recent increase of Gmail and Blogspot spam is from accounts created by spammers who hire low-wage laborers who solve the CAPTCHAs manually.


Posted by
Ben Maurer
28 March 2008 @ 6am

Hi,

I’m an engineer on the reCAPTCHA team, I just wanted to explain some of what’s going on here.

First, I want to assure you that we’re making sure that the negative impact of our CAPTCHAs on users doesn’t grow out of control. We do this by measuring a number of statistics (success as passing the CAPTCHA, number of refreshes).

We’re working on various ways to increase the readability of reCAPTCHA without reducing the security. We’re also considering allowing sites to express a preference level about the security of their CAPTCHA. A contact form’s captcha might not need to be as secure as TicketMaster.

I also thought I might give some background on why the first word is hard to read in this CAPTCHA. The word appears to be “will”. Notice how small the word is compared to the second one. This is a mistake. When we create a CAPTCHA, we have to ensure that the words are a reasonable height. However, we can’t just resize our bitmap images to the correct hight. Think about the difference between the word “ocean” and the word “recaptcha”. “ocean” has letters that are all the height of an ‘x’ while recaptcha has both ascenders and descenders. If we took a bitmap of these two words and made them the same height, recaptcha would appear to be much smaller.

What we do to prevent this is look at what and OCR recognized as the x-height of image, and then use this to judge the height of the word. However, in rare cases, this is incorrect. If you look closely at the image, you’ll see that the height of the l in will is the same as the height of the e in decorated.

- Ben


Posted by
Ben Maurer
28 March 2008 @ 6am

One other thing I think I’d point out:

Most of the time we get a support email from somebody saying “recaptcha is too hard” it turns out that the problem has another cause. For example, we get quite a few emails saying something like:

“The website said that the password must have at least one number in it, but I don’t see any numbers in the two words”.

We’ve also encountered websites that were broken in a specific browser and this caused users to believe they weren’t passing reCAPTCHA when in fact they were typing the correct answer.

If you’re getting lots of complaints about reCAPTCHA, it’s worth doing a through investigation to find out if there might be another cause.


Posted by
george
28 March 2008 @ 9am

what recaptcha?


Posted by
subcorpus
29 March 2008 @ 7pm

Tom::
good guess … but not everyone of us can do it …
i tried re-captcha on my blog … didnt seem to work out well for commenters … so had to remove it …
re-captcha has a nobel intentions … but it doesnt seem to work as said …


Posted by
Leo F. Swiontek
3 April 2008 @ 1am

I’ve been using RECAPTCHA for past few days only.That’s why i m not able to comment.I’ll soon try and watch it works or not


Leave a Comment

Egg McMuffin inventor dies at 89 Testing upgrade