This is a fascinating little look at the way words are used by folks online.
Cogsci researchers put lots of effort into analyzing how humans process language and they’ve developed all sorts of crazy means to measure these sorts of things. But with Google grinding through 4 billion webpages, it’s safe to say they’ve probably got the largest human language dataset that is easily queried. And stuff like page counts for word length is an easy way to plumb the depths. If it hasn’t already happened, I bet there’s a dissertation or article destined for Science on using Google as a cheap (free) text analysis tool for some robust studies. Though, as the post points out, large datasets will take incredible lengths of time to complete using existing tools. Still, I bet someone out there is up to the challenge.