Science for SEO: CAPTCHA broken!

October 21, 2008

CAPTCHA broken!

Dr Jeff Yan and PhD student Ahmad Salah El Ahmad have revealed widespread vulnerabilities in the Microsoft email service. They actually cracked this in 2007 but had to notify Microsoft first and wait for them to work on it to publish their findings.

The CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) was responsible for the vulnerabilities. This is used to defend against automated systems that grab email accounts to deliver spam or put ads on blogs.

They found that it was computers that could break CAPTCHA. Normally the machines get confused by the letters because they have to separate them and hen put them in the right order. The method used by the scientists took 80 milliseconds to break the CAPTCHA. They removed the arcs in the Microsoft scheme (literally arcs drawn around the characters) that make the letters hard to decipher and then managed to put them all in the right order. It's their colour filing method which was key to their success, combined with the usual vertical histogram analysis. Using this method CAPTCHA can be broken 60% of the time - Wow.

They say about the MSN scheme:

"Security. A major problem of this scheme is that it is vulnerable to our simple segmentation

attack. The segmentation resistance built into this scheme seems to be largely about

preventing bounding-box based segmentation, and apparently its designers never realised that

a simple color filling process can be used to do segmentation effectively and that a

combination of vertical and color filling segmentation can be powerful. Moreover, it is easy

to tell arcs from characters by examining characteristics such as pixel counts, shapes,

locations, relative positions, and distances to baseline. In addition, the use of a fixed number

of characters per challenge also aids our segmentation attack."

The problem is, as Dr Yan says, once the character recognition is done, then it's just a matter of using recognition techniques like neural-networks (around since the 1940's). Recognition is very easy today.

Their work was presented to the companies concerned before being made public of course, and they have contributed to making CAPTCHA more efficient and robust. The thing is, humans must still be able to decipher it.

Read their full paper here.