Tech is a huge and essential part of how we process and represent complicated information, but as we’ll see, some of the best data insights came long before the first electronic computer.
Ever wonder why the United States Census is only carried out every ten years, when it takes just a few hours to count the millions who tuned in to “NCIS: New Orleans” last night?
Aside from the fact that the Constitution calls for exactly that, this timing results from a problem as old as the Census itself. Legislators were faced with a big data dilemma over two centuries before the term “big data” was even invented: they needed to process U.S. Census data in a meaningful way.
The early censuses produced mountains of data, and all that information needed to be processed without computers. Impressively, the United States government produced statistical atlases with a remarkable sense of data visualization. And in so doing, late 19th- and early 20th-century government tabulators set the standard for creating meaningful representations of huge swaths of data, one that companies are following to this day.
The Four V’s
In 2012, IBM released a framework for big data called the “Four V’s”: Volume, Variety, Velocity, and Veracity. The model assumed a lot about the average big data analyst, including their computing power, processing speed, and knowledge of the data at hand. Unfortunately for the data collector at the turn of the 19th century, there were few tools available to process the V’s of big data beyond their own brains.
And yet an infographic about measles deaths from the Twelfth Census of the United States (1900) seems to understand so much about the “Four V’s” that it could originate from 2010 Census data (if not for the outdated font and browned hue). Using multiple different data points — time, death toll, and location — Henry Gannett, the chart’s creator, clearly saw the value in the “Volume” of his data.
In a time when data collection meant pen, paper, and door-to-door canvassing, Gannett was restricted by the census’s inability to perform at a high “Velocity.” With a disease as deadly as measles, any data would prove valuable in fighting an outbreak. Gannett collected and published the data and infographic from the Twelfth Census in three years — a respectable velocity in a time when homes didn’t have electricity.
The “Variety” and “Veracity” of the census data, the final V’s in IBM’s big data framework, reinforced the infographic’s authority and potential impact. To visualize big data correctly, the data needs to be accurate and relevant, and cleaning data like that requires a talented team and strong commitment. Faced with unreliable data collection standards and medical records that would give any modern data analyst nightmares, Gannett designed visualizations that not only made the data understandable, but also provided actionable steps to create real change.
Emerge From Big Data
Given the massive limitations, Gannett’s work is a rousing victory, both for big data best practices and for the public health. This work, along with many other works from the Statistical Atlases of the United States in that period, demonstrate the real value of data that’s not only comprehensive and relevant, but visualized in an effective, understandable way.
Though we aren’t the first to suggest adding one more V to IBM’s framework, it’s something we believe in strongly: no matter the Volume, Velocity, Variety or Veracity of your data, all four elements must be “Visualized” properly when attempting to gain actionable insight from your mountain of data.
Whether you’re just starting to tackle big data and data visualization or are already well on your way, make sure to take a step back from the database and evaluate your data’s moving parts to understand what it’s all really worth.