Typeracer: WPM and Win Expectation

TLDR: No, I can’t beat Sean Wrona. I know because I already raced him 50.000 times. | As Michael Phelps in the world of swimming, Sean stands alone.

The name of the blog, Macholytics, is not only a lazy wordplay on my surname. Initially, I’ve created it to have a space to post my thoughts on sports analytics; sports being macho, that’s where the name came from. Admittedly, there aren’t many things less macho than a guy typing away in a dark room. And yes, it always has to be a dark room, the God of stereotypes demands so.


But, like in any race, there are multiple elements we love in sports so much: how one handles pressure; can he bounce back after a bad start; how does he handle fatigue. And, to a certain level, the phrase “any given Sunday” applies here. Years ago, after racing over 3.000 rounds of Typeracer, I wasn’t any wiser about how big a shot does a slower typer have against a faster one. These days I have ways to find out and can finally put this issue to bed.

Typeracer

While it’s fair to admit that keyboard warriors are lame, I’m not gonna dissect Facebook trolls here. Fast typing competition had been around since about the 1930s, and as I explained in a story on my other blog, it can get pretty tense. Typeracer is one of the biggest sites for typers to test themselves and race others. Its English universe is alive and kicking 24/7/365; the races are going nonstop. Oh, English is not your cup of tea? In that case, there are multiple other languages with a huge collection of texts and plenty of players to compete against: Indonesian, Spanish, Portuguese, and many more.

A great thing about Typeracer is it saves and keeps every single race, so we’re provided with a crazy amount of data via typeracerdata.com. There, we can see detailed player stats, so we learn that the fastest players overall is arenasnow with 200.8 words per minute or that keegant collected the most races (383.561) and wins (310.751).

Now I have to mention some flaws. This methodology is not perfect by any means because the data isn’t. I’m not saying the site is flawed in itself, but for the purposes of this study, I have to mention that Typeracer can be gamed easily. Racers can be picky about the text they race; for example, long texts with a lot of dialogue tend to be bad for your average. Short texts, by their nature, deviate from the average more than the longer ones; you’d get more above- and below-average results on those. That would be great if there wasn’t an option to simply abandon the race with no consequences. In a case racer leaves the race without finishing, it’s not saved in his or her race log. And so, if you decide to play dirty, short text can have a huge upside with no risk of getting a well-below-average result. Plus, Premium users can race themselves on a private track and only save their good results. As a premium user who types at 125 WPM on average, I can collect dozens of 150 WPM results without breaking a sweat on this track.

All that being said, just because it’s possible to cheat doesn’t matter it happens very often. Sure, quitting happens sometimes, and it’s probably safe to assume that the worst 1% of everyone’s career ends up ditched, not saved. Fact is, the slower players don’t have that much reason to quit races, and the fastest players kind of know each other. Nobody wants to be known as a cheating scum. So let’s assume it’s safe to ignore the flaws.

Similarity scores

With all the details readily available, nothing stops me from the first question I have: who are my closest neighbors on the site? I filtered out racers with over 3.000 races on their record and compared them on their speed and variation. Speed is self-explanatory, and variation measures how spread out (far from average) the results are. Higher variations will have crazier density curves, while the lower variation will show curves more stacked around the racer’s average. More on that later.

I used standard deviations (STDEV) as a measure. The weighting I chose arbitrarily after experimenting with it: speed consists of 70% of the similarity score and variation of only 30%. The first smell test consisted of looking for the closest neighbors of the site’s champion, Sean Wrona. This was perfect for more than him (arenasnow) simply being #1: Sean has two accounts. The second one, arenasnow2, is slower (by 1.11 of STDEV) with a higher variation (0.26) than the original account. Somehow surprisingly, it only ended up as the second closest neighbor. Truth be told, that #1 account has such a goddamn dominating track record that it really doesn’t have any close neighbors at all. As Michael Phelps in the world of swimming, Sean stands alone. User named performancecheck was a little closer to Sean’s original account, but even he was 0.73 STDEV away.

Density curves

When I searched several levels lower and looked into my own account, I found dozens of very close neighbors. We can vary in age, nationality, work, or hobbies, but our typing is indistinguishable. Remember: Sean’s closest neighbor is 0.73 of STDEV away from him. In my case, I’ve found 193 racers with STDEV of 0.35 or lower and 44 racers with STDEV less than 0.15. That’s a lot of possible friends and I wanted to know if the density curves—how often is each speed present in our record—are going to be as similar as our overall metrics are.

The difference between career-average and last 3.000 races

I decided to download the last 3.000 races of twenty of the closest neighbors and compare them with my record. That’s where I realized the beauty of Typeracer: people get better by using the site. My average is 127.2 all-time, but 128.7 in the last-3000. And that’s just me, who raced barely a few hundred above three thousand. Thebrownfox, one of the closest to me in career average, raced more than 27.000 times, so it comes as no surprise that his sample size got massively better: from 127.7 full-time to 135.8 in the last-3000.

Simulations model

The fact that suddenly the results weren’t as close was actually excellent. If we all were too similar to each other, the simulations wouldn’t give us any meaningful information—nobody would have an edge, and the model would only give us a fistful of useless 50/50 matchups. The average career-similarity with this group was initially 0.06, and it bounced to 0.15 when comparing last-3000-similarity. Suddenly, the group had some relative high-achievers with averages over 134 WPM; and a clear-cut favorite emerged.

I simulated each match-up 10.000 times. With 210 match-ups, at 40 seconds per race, racing just each other for 16 hours a day, it would take almost four years to get it done. We’re lucky we have Excel—without the spreadsheets, the suicide rate of this project would be unbearable.

Thebrownfox cleaned the simulation universe with most of his victories coming in a lop-sided fashion.

Is variation important, at all?

Yes, if I’m guaranteed to race someone faster, I totally want a wildcard who’s capable of pulling 180 WPM but also can get mentally wrecked and post sub-100 WPM kind of performance. That’s the kind of racer I can get lucky against and win. If I’m 1-on-1 with someone who averages 150 WPM but has their act together, I don’t stand a chance. In an individual race, variation matters. If you race someone to the best-of-5, it still does, but less so. With a 10.000 sample size, the variation is close to meaningless.

When only looking at the players with similar level—those who averaged no less or more than 3 WPM than their opponent—the ones with less variation won a little more than they lost. The trend could be partly because thebrownfox was both #1 in speed and #1 in variation, but also, simply put, less variation goes hand in hand with skill. All in all, the variation only explained about 10% of the results.

Less variation is always better, but its impact on such a large sample size wasn’t significant.

WPM leads to Win Expectation

I planned to simply figure out my win expectations against similarly skilled people. And it didn’t take long to realize that if I’m paired up against someone who’s five percent slower than I am, which in my case means someone who types 120 WPM, I should win about 65% of the time. Eight percent slower (117 WPM), I should win close to 75%.

Still, it felt incomplete to end the analysis there. I decided to add ten more racers to the simulation, from levels both above and below the 120-130 threshold I was originally looking at. This expanded the initial results by another 255 match-ups with a much more variety of skill levels.

Oh, and no, I can’t beat Sean Wrona. I know because I already raced him 50.000 times.

Turns out that once you’re about 30% worse than your opponent, such as 120 WPM type of racer going against the 170 WPM (or 80 vs 115), your chances are as good as dead.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s