Why captchas are getting harder?

Why captchas are getting harder? I am not a robot. And yet, my computer accuses me of being one, constantly. Sign up for a fitness app profile...

Friday, May 14, 2021

/ by Avishek Bera


The News Cover: I am not a robot. And yet, my computer accuses me of being one, constantly. Sign up for a fitness app profile captcha. Getting a vaccine appointment captcha. Buying dumbbells captcha. Ordering cookies online because I have no self control captcha. And the most annoying part? 

I don’t always pass these captcha tests on the first try. It feels like captchas are getting harder and they are. But it turns out there’s a lot more going on behind the scenes than just proving you’re a human. The word captcha is an acronym. it stands for Completely Automated Public Turing Test to tell Computers and Humans Apart. 

So there's a little bit of cheating because of the T. there’s like a lot of T's in there. “Turing test to tell.” Luis Von Ahn invented captchas. In the year 2000, he was a first year PHD student at Carnegie Mellon University, attending a talk by the chief scientist at Yahoo! In the year 2000, Yahoo! Was like the biggest tech company out there The talk was about 10 problems they didn’t know how to solve, and one in particular stood out They had this problem that people would write programs to obtain millions of email accounts from Yahoo!

 And the people who did that were spammers. so they just couldn't figure out how to stop it. what we need is a test that can distinguish humans from computers The test needed to be passable by any human regardless of age, gender, education, or language. That becomes even more challenging because this is a test that a computer should not be able to pass, but a computer should be able to grade. 

It's kind of... kind of a paradoxical idea. The epiphany came when they realized that humans are really good at optical character recognition a.k.a reading. We read text at all kinds of angles, in different lighting conditions, when it’s bent over the seams of a book, when it’s in scratchy doctor handwriting, and we’ve been training ourselves on how to do this since we were kids. you don't need to be all that smart or know how to spell or anything. You’re just kind of pattern matching. 

Computers of the era were really bad at this, making it the perfect test. Captcha programmers would give the computer the correct text so it knew the answer. Then they’d stretch that text and warp it. The computer with the answer would be able to grade it, but a new bot that didn’t have the answer wouldn’t be able to understand it. Having cracked the code, they gave it to yahoo, who started using it on their front page for sign ups within a couple of weeks of the first implementation, it was being used millions of times a day And the test worked. 

It differentiated between humans and computers, and helped stop bots. But in the background, all the letters and numbers humans typed were doing something else: making computers smarter. In 2005, a new version of the test debuted, called reCaptcha. It used two words: One was generated, so that the computer knew the answer. The second word was pulled from a book, or an old, distorted New York Times article, and the computer had no idea what that word was. 

When a human got the generated word right, the program assumes they likely got the other word correct as well though they’d distribute the same word to several other people just to be sure. If there was consensus, they’d approve the word. So many tests were taken, that a year's worth of New York Times articles were digitized roughly every four days. Then Google acquired reCaptcha in 2009, and began using the tech to digitize their scanned books and news archive. 

When you repeat this process enough times, you begin to build a robust image library of distorted characters. And eventually, with enough images in this data set, the computer becomes smart enough to extrapolate letters and words from new images. captchas basically taught computers how to read extremely warped text. In a test by Google in 2014, a human could read their most distorted captchas with about 33% accuracy. Their AI got it right with 99.8% accuracy. 

And once the computers got better than humans, the test had to change. Enter reCaptcha V2 which features images instead of text. They served the same purpose: differentiating between humans and computers, and keeping the bots out. But this time Google leveraged the tests by getting humans to teach machines how to identify real-world objects. You might have noticed that V2 tests often have us selecting transportation photos: fire hydrants, traffic lights, crosswalks, and more. 

Google uses this data to train their self-driving cars to see these objects as well as to improve google maps. But just like computers learned how to read warped text better than humans, they’re also getting better than us at figuring out these picture puzzles. So much so that the test had to change again as did the way the computers graded the test. Nocaptcha, and its most recent counterpart reCaptcha V3, verify that you’re human just based on your behavior. 

So, how does that work? There’s a secret test constantly running in the background, making this captcha nearly invisible. If you seem bot-like like if you click around too quickly, or type out paragraphs of text in seconds then they'll make you take a standard picture test or ask you to verify yourself with two factor authentication. Pretty much. Now, if you use the Web, basically you’re being tracked. That's just it… the idea is now we can tell that you're a bot or not because we can tell who you are. 

You can say this is creepy but from a usability standpoint, that's a lot better as opposed to me having to do some puzzle or whatever. You kind of already know yup, this is a human But unlike previous versions of the test, there’s no public facing answer for what our clicks might be training computers to do. And, it’s not clear how long behavior tracking captcha tests will last before computers outsmart them it is my belief... that at some point computers are going to be able to do everything that humans can. It may take a while, but at some point they'll be able to. 

And so there's not going to be a way to differentiate between a human and a computer This was not the first idea we had Actually, the first idea we had was giving you some images, and then we would ask you, what are these images of? Basically we go find a lot of images of flowers, we’ll give you a lot of flowers, and we would say, ‘hey, can you and you tell us what these are images of?’ The problem with that is that humans were not that great at it. For one, it kind of required them to spell and you’d be surprised how bad people are at spelling. 

And then secondly, if it’s flowers, people could say plants. or, cars. But it turned out all cars also had tires so people could say tire. And so it was kind of hard to get it right. Whereas with with a text, it’s this beautiful thing where not only are humans trained on it from very early, but also there is a key for everything that we display, like in the keyboard. So it’s like R yes R, T yes T So that’s, that’s why we settled on that. 

No comments

Post a Comment

News Cover © all rights reserved
made with by templateszoo