What data says about Karen Carpenter and Tiquan Underwood

There's this chart that was going around the interwebs: The Vocal Range of the World's Greatest Singers. There's nothing the interweb loves more than a list of questionable origin, except perhaps taking that same list as unquestionable fact.

There's a lot to head-scratch and chin-stroke about this chart, for instance: Buddy Holly, Brian Wilson and Smokey Robinson all share the same "highest note", which is merely one whole step (that is to say, only a little bit) above the highest reaches of Bob Dylan, Tom Waits, and uhm, Eminem.

These two sets of three singers might seem reasonably grouped, albeit closer together than you might expect, especially between the growls and barks of Waits/Dylan/Eminem and the delicate stylings of Buddy/Brian/Smokey.

And then's theres something in the data that makes you think "No, no, no. Something's f'd up." Waits/Dylan/Eminem's high F is the same as Karen Carpenter's highest note.

Using vocal range as the metric for ability, we're looking at a data set that suggests Karen Carpenter was less of a singer than Eminem (who has another whole octave in his range below Karen's). All these singers pale in comparison to, yeah, you guessed it, Axl Rose: Greatest Vocalist Ever. People Magazine, pushing misinterpreted data into a headline since 1974.

Instead of trying to describe the details of this prima facie ridiculous suggestion, let's take a moment here and listen.

Tara McGinley writes about Karen Carpenter (2012, Dangerous Minds), and includes some amazing mostly unaccompanied tracks on YouTube, from the very fun site Studio Multitracks.

And Karen's buttery alto was just part of the package: she could really wail on the drums:

The rest of the story is more sad and complex than I can do justice to here (read more), but despite being tragically un-hip in a overly hip age, this much we know:

"She was one of the greatest voices of our lifetime" -- Elton John

Karen wasn't a soprano, nor an operatic acrobat, but Eminem, Kurt Cobain, Lou Reed hardly seem like choir cohorts in any sense at all: but in this dataset they all line up like stars of the same constellation.

Isolating specific datapoints — a singer's range, a person's ethnicity, a woman's attire — as indicative of larger conditions is just what we, as mere human beings, do. Whether it's comparing annual localized weather patterns for traits of changes to global atmospheric composition, scores in standardized tests as indicative of future contributions to humanity, or holding a hand on a forehead to diagnose fever and infection.

We search for patterns and interpret themes and variations as leading indicators. We search the skies, a lover's expression, or last quarter's spreadsheets for some cause-and-effect that ties our perception of the world out there to the experience of our solipsistic self.

Finding and gathering details to compare across different people, or different scenarios, is one of the biggest parts of technology today. Finding these measurables to track is the first step on the rigorous path to optimization and growth, which we reflexively and culturally assume is the one abiding goal.

Intuitively, perhaps, you'd think each hard earned morsel of information is meaningful — our inner story teller finds a way to connect from A to B. Or that fascinating correlations (say, the number of people who drowned by falling into a pool and the number of films with Nicolas Cage) could have some cause and effect.

More data isn't always helpful, more data is sometimes just more noise (see Nate Silver's book, which I've mentioned before). Taking "noise" quite literally, this quest has been a part of my entire life: finding (or creating) the underlying meaning and structure beneath a façade of complexity.

Seemingly irrelevant data can sometimes have a life all it's own. Fantasy Football is a game made off the data exhaust of the National Football League, with 27 million people "owning teams" and competing, with an entire industry valued at around $800 million (from Grant McCracken's Culturematic, 2012). The numbers that correlate to points, and wins and losses, in fantasy football often don't reflect who wins the games in the real world.

The thirst for data from football extends beyond the standard statistics used in fantasy football, to the databases generated by sites like Pro Football Focus (PFF) and Football Outsiders. These sites use fans to grade each play or player during every game, generating data far beyond what is commonly available (whether the data gathered reflects an accurate judgement, while not knowing what the player was assigned to do, is another question altogether). Both sites market their data to paying data-hungry fans to make judgements for fantasy leagues, and provide data for story lines to the media to keep the 24/7 news tickers spinning.

Does any of this added data help in picking winners and losers on the football field? Football Outsiders found an incredibly strong correlation in their data (2006):

There's only one conclusion to be drawn from the data. If you want to win, you have call more kneel plays. (link)

A fun, rather Onion-esque observation (a quarterback kneel is typically only taken by the winning team in wasting time off the clock at the end of the game). Football Outsiders is always an entertaining and informative read, and they have created new metrics and new style of sports journalism, but there is no magic hidden calculus to be found.

And now another slight diversion, to capture an anecdote from my life, with regard to football and data, from The Boston Herald, 2008.

The news that the Patriots worked out receiver Bethel Johnson last week sent one reader scurrying to his hard drive.

Last year, Pats fan Jason Uechi of New Jersey read a USA Today story about Tom Brady [stats] and was struck by one passage:

His father recalls a regular-season game several years ago that the Patriots won handily, yet his son was fuming afterward. He had suffered an interception because the receiver ran the wrong route. That receiver did not have another ball come his way the rest of the season, according to Brady Sr.

Uechi was intrigued. Could he guess the object of Brady’s scorn? He downloaded play-by-play data from 2002 through 2006 into a 2,322-line Excel spreadsheet. He then sorted each receiver by interceptions, receptions and incompletions as the intended target.

And then this line jumped out at him from a 29-6 victory over the Bills on Nov. 14, 2004:

(Shotgun) T.Brady pass intended for B.Johnson INTERCEPTED by N.Clements at BUF 17. N.Clements to NE 48 for 35 yards (T.Brady)

“Tom even had to make the freakin’ tackle,” Uechi noted. “Yeah, I’d hold a grudge, too.”

And sure enough, the data backs up the elder Brady’s claim. Johnson had six receptions to that point. Another two passes thrown his way had been intercepted. And another seven fell incomplete. But from that day forward, Brady did not throw a single pass to Johnson for the rest of the season. Seeing only the data, it would be fair to assume he had been cut the next day.

((Oh my John Tomase, just 30 days hence. #toosoon))

The annual NFL Draft is another spot where there is a plethora of data available. Graduating college students and eligible underclassmen are measured for height, weight, strength, speed, leaping ability, intelligence, and any other thing a football scout can measure (by some accounts, the total set of students scouted is approximately 3,500 students annually). Each team takes these same measurables, watches piles of game tapes, practices, and interviews coaches all the way back through high school to see if the kid is worthy of being one of about 500 (both drafted and undrafted) wearing an NFL jersey for the start of training camps.

These figures are compared to every other player in the draft, and compared to current and the most successful professional players. Teams have a defined profile for the physical attributes they want at every position, and compress the data to an internal code that will provide an at-a-glance summary when they are ready to pick.

For fans, a cottage industry exists in draft reports and magazines that break down the numbers on every top player. The NFL Draft is now a huge fan event, getting higher TV ratings than actual playoff games for basketball or hockey aired at the same time.

So with all this data available, is picking the player most likely to succeed in the NFL, deciding who to draft, a problem that can be solved? Turns out, not so much. Cade Massey (Wharton School) and Richard Thaler (Univ. Chicago) did a study that showed:

There is skill in making individual picks, Massey says, but the fact that draft success isn't sustainable points to the conclusion that every team is fairly evenly matched (Deadspin).

It's really kind of a crapshoot, even with all the data, and all the research, and all the historical comparisons. What does make sense, in a league with a salary cap like the NFL, is to stock up on more picks where the price and value make the most sense, like the second and the third round. As a Patriots fan, this sort of thing normally drives us crazy, with Belichick trading down out of the first round and all the high profile players, to pick a bunch of relative unknowns on the second day.

In the end, many of the measurables you can gather before the draft don't correlate to success. For all positions, there is a floor for size and speed (which has changed over the years, with players getting both bigger and faster) where the laws of physics apply: you don't want to stop a 250 pound linebacker running at a quarterback with only a 150 pound blocker in the way. But is there much to learn from how fast an offensive lineman can run 40 yards, when he rarely has to run more than 10 on any given play? And beyond all that, as a team sport, how much can you measure of purely individual traits that matter to wins and losses?

You can't guarantee every 6'5" receiver will be as successful as Calvin "Megatron" Johnson (4.35 40 yard time; picked second in the draft in 2007, Detroit). But you can find Tiquan Underwood (6'1", but 6'5" with the high-top fade; 4.31 40 yard time; picked 253rd, 2009, Jacksonville) and he can, at least in theory, provide some of the value at a much lower cost (and I say "in theory", aware Ti is not Megatron, and aware that the rare physical gifts of a true superstar can be a difference maker for a team).

But we cheer for teams, and we cheer for players, not for cells in a spreadsheet. We cheer for the stories we hear, and that we imagine, for seeing hard work reward the nice guys like Tiquan, no matter what the crowd-sourced data might proclaim (the grades at PFF had Ti ranked higher than Brandon LaFell as free-agents this off season, for what it's worth).

Even while all the drama of the draft makes for exciting television, and the data exhaust makes for exciting ways for fantasy fans to compete and enjoy the game, the real reasons for success for an individual player remain mysterious, invisible and close to impossible to quantify: solid coaching, leadership in the locker room, a system for player development, a stable organization, and their desire to put in the hard work to succeed, or maybe a deep personal fear of being an insurance salesman.

But, even after the passing of the King of the Countdown, we live in a listicle culture, where any nuance is lost to "just tell me who's number one".

Which brings us back to that list, and to Karen.

The vocal range chart, put together by ConcertHotels.com, was based on data gathered by The Range Place, an online community for discussion of singers. Much like the graders for Pro Football Focus, the work is done by passionate and knowledgeable fans. A spot check for a few of the highest notes proves to be both easy to validate (the site lists the song, and timing, of the note in question), and rather accurate.

But here's the rub, and I don't mean to demean the hard work and dedication put in by the members of The Range Place, but: vocal range doesn't mean singing range. Anybody can have a wide range if measured from the sounds you might utter on the toilet to the sounds you might squeal when burned on a stove. But neither of those are necessarily sung.

Tom Waits, "Heart Attack and Vine" live, at the 1:09 mark hits what the list calls his "high note", which can be annotated here as <<screams />>

The folks at The Range Place make the distinction between "sung vocal range" and "total vocal range", but not for everyone on the list. Thus in lieu of original analysis, list makers the ConcertHotels presented available data, mashed up with a Rolling Stone Top 100 list, then generating considerable link bait chum, but limited scrutiny.

There are, however, some interesting things to consider in this space between singing range and vocal range:

1) The reason Mariah Carey is so far higher than the rest is her ability in the whistle register. Historically tied to coloratura sopranos in opera, the technique was brought to pop music by the amazing Minnie Riperton ("Loving' You", her biggest hit, see 0:56. Many more videos available).

Physiologically, we can all access the same register — but it's a rare talent who can control it, and actually be able to turn it into phonemes.

2) If we were truly to decide that vocal range is the way to pick the greatest vocalist, then we'll have to take a trip to Tuva and it's surroundings, for overtone singing where the singer sings a very low fundamental, and manipulates it's overtones, thus singing two or more notes at a time.

Now that's pretty freakin' amazing.

Kongar will never make it onto a top 100 list of Western singers, and there is no way to make an objective statement about the "best" singer, anyway. So much of our preferences in music are wrapped up in culture and history, and personal experience. You are who you listen to, perhaps, and once the complexities of identity come into play any objectivity goes out the window.

And here's where some sort of quantitative comparison might be appealing. Of course, we've always used sales and money (i.e. Billboard) as the way to pick the winners, although that too is not without bias. It's not a surprise that a list like this one grabs the interweb's attention, but when it's made of noisy data (vocal range versus singing range), and pre-wrapped in cultural assumptions (Rolling Stone's top 100), it's hard to see past the link-bait chum of "you won't believe who sings even higher" to find much real value.

And that also leads us back to a key point: "Linkbait sans scrutiny" is not an effort to promulgate Axl Rose's superiority, nor even convince uncaring history books that Bob Dylan was actually a tenor (no, he wasn't). So much content online is to simply get you to look, and to click, and thus be able to turn the hamster wheels of the internet cookie machinery to gather data about you and to develop an ever more detailed profile of your purchasing probabilities and your web behavior — like Karen sings in "Superstar": You said you'd be coming back this way again baby.

Baby, baby, baby, oh baby, the measurable, online, is you.