ReCAPTCHA’s quality is going down?

Several months ago, we implemented ReCAPTCHA on MetaFilter contact forms, to thwart spammers. It’s a good cause and a great idea: the nonsensical text you decode ends up helping public domain book scanning projects.

But lately, we’ve been getting a steady stream of complaints that it is not working or is unsolvable. Last night I tried out the contact form and was surprised that in the first ten images presented to me (keep hitting the little refresh button, the top of the three buttons on the control), at least half were totally undecipherable.

Here’s an actual screenshot of one I saw this morning. The first word is impossible to decipher. My question is, has ReCAPTCHA had such success that all we’re left with is the really, really bad book scans?

20 Comments

  • We actually got a new kind of spam that seemed to be semi-manually generated and came to the site *because* we implemented the ReCaptcha.

  • I noticed the same thing a couple days ago. I had to press the reload button two or three times to get one I could decipher. I hope your conjecture about its success is correct, but I also hope they find away to make it more usable again.

  • I don’t know if they’re using reCAPTCHA, but I noticed the same problem on TicketMaster.ca a few days ago. I kept refreshing until I finally got one I could reasonably guess at.

  • I often have a hard time reading those things.

  • what recaptcha?

  • Hi,
    I’m an engineer on the reCAPTCHA team, I just wanted to explain some of what’s going on here.
    First, I want to assure you that we’re making sure that the negative impact of our CAPTCHAs on users doesn’t grow out of control. We do this by measuring a number of statistics (success as passing the CAPTCHA, number of refreshes).
    We’re working on various ways to increase the readability of reCAPTCHA without reducing the security. We’re also considering allowing sites to express a preference level about the security of their CAPTCHA. A contact form’s captcha might not need to be as secure as TicketMaster.
    I also thought I might give some background on why the first word is hard to read in this CAPTCHA. The word appears to be “will”. Notice how small the word is compared to the second one. This is a mistake. When we create a CAPTCHA, we have to ensure that the words are a reasonable height. However, we can’t just resize our bitmap images to the correct hight. Think about the difference between the word “ocean” and the word “recaptcha”. “ocean” has letters that are all the height of an ‘x’ while recaptcha has both ascenders and descenders. If we took a bitmap of these two words and made them the same height, recaptcha would appear to be much smaller.
    What we do to prevent this is look at what and OCR recognized as the x-height of image, and then use this to judge the height of the word. However, in rare cases, this is incorrect. If you look closely at the image, you’ll see that the height of the l in will is the same as the height of the e in decorated.
    – Ben

  • People usually miss one of the most important aspects of recaptcha — it doesn’t know what one of the words is. So, if recaptcha presents two words, and one is very clear and easy to decipher while the other is gobbledygook, then you can enter gibberish for the undecipherable word and still make it past. In the example above you can enter “decorated iAmARutabega” and it would still register as a valid entry.
    Such actions shouldn’t hurt the recaptcha project because it requires a consensus of words to decipher and perform human OCR.

  • There was a recent report that Google’s CAPTCHA system was cracked by spammers but now those reports are being questioned by Google, saying the recent increase of Gmail and Blogspot spam is from accounts created by spammers who hire low-wage laborers who solve the CAPTCHAs manually.

  • The first word is wall. Try guessing what an idiot would guess the word to be!

  • No, the first word is clearly “will”!

  • My guess would be “will” for the first one, so it’s not sooooo difficult ;) But I agree, that such situation happens often lately.

  • I’ve only used recaptcha for about a month, so I can’t speak to a trend. But I’m usually able to tell which of the two words is known, and which word recaptcha is learning.
    Earlier today recaptcha presented me with what was obviously a greek “pi” character (π)…I didn’t know what to say.

  • We have had a similar issue on StumbleUpon, so we added a prominent link that says “Can’t read this?” which calls Recaptcha.reload() to fetch a new image.
    From what I’ve been told from the folks at reCAPTCHA, they’re working on fixing the rendering to be more legible.
    Cheers,
    Eric Goldberg
    StumbleUpon Dev

  • Tom::
    good guess … but not everyone of us can do it …
    i tried re-captcha on my blog … didnt seem to work out well for commenters … so had to remove it …
    re-captcha has a nobel intentions … but it doesnt seem to work as said …

  • Leo F. Swiontek

    I’ve been using RECAPTCHA for past few days only.That’s why i m not able to comment.I’ll soon try and watch it works or not

  • “nobel” intentions? Wow, they must be very ambitious for spam protection – I’m not sure there’s a prize in that category…
    I’ve had good results & personally prefer it to deciding if the “O” or “l” in “sKl7jOK” are letters or numbers, if case matters, etc.
    Viva reCAPTCHA…

  • Seriously? Seems obvious it’s “will”
    Is this like those Magic Eye puzzles that some people just can’t get?

  • I got just a very small spam problem on my site, so I am going to try reCAPTCHA. But I will keep an eye on how difficult it is to read.
    What about an alternative captcha, that is the usual number-letters-thing and a reCAPTCHA, so the visitor can choose their favorite type.

  • “will”? I am sure you could have picked a tougher example to show that it is difficult to figure out.

  • I’ve just tested the audio capability of reCAPTCHA because we need to provide an accessible implementation of CAPTCHA. I have to say that the audio version is completely indecipherable – are efforts underway to address this issue? Can anyone suggest an implementation that is more understandable?

Comments are closed.