"Isaac Asimov's Big Data 101"
Updated: Sep 10, 2018
I recently saw a video where people on the street were asked about Big Data. While the hawkers and vendors didnt have a clue about it, a good number of people had intelligible answers while others mention that they had heard the term being used pretty often by friends and family. This buzzword has created a lot of interest and has already attracted investment of $1Billion upwards over the last few years. It has been hyped to sound like it is the next best thing since sliced bread (to quote a commonly used cliche). The truth is, the idea of Big Data has been around since before the Chorleywood baking process invented sliced white bread back in 1961. Isaac Asimovs foundation trilogy (1940s and 1950s) introduced an important character Hari Seldon who developed a science called psychohistory (which now has a different meaning in the healthcare industry). In the series, Seldon was able to make predictions about the decline of the then powerful galactic empire. Sounds familiar? Seldon is also shown to be able to orchestrate events that manipulate people to achieve these goals. Does this sound familiar? Maybe not.
Similar to the above principles of analysing large volumes of different types of data (Big Data), companies are now clamouring to implement solutions to harness its potential to get a better understanding of their customers, automate analysis and predict trends in the business environment that will affect them. Companies have been doing this since even then and they will continue to do it.
However, this has assumed greater significance due to the nature of competition, wide variety of options available to consumers and technological advancements that have reduced time to market for products and services. While it might be common sense to say analysis of data will yield insights that will be crucial to an organisations success, it is important to recognise that there are three types of analytics:
1) Descriptive analytics: Put simply, this type of analytics looks at past and current performance by mining historical data to explain success or failure. Sales, marketing, operations and finance functions use this kind of analysis quite often. Descriptive analytics accounts for a majority of the reports out there in the business world and the tools and products available in the market for this purpose
2) Predictive analytics: This uses a variety of techniques from statistics, modelling, machine learning, probability theory to identify patterns in historical and transactional data highlighting inherent risks and opportunities in the marketplace. True to its name, the models developed predict what could happen next. For example, you could be a large retailer attempting to predict what to use on your summer line of clothing floral patterns or pastel colours
3) Prescriptive analytics: Prescriptive analytics takes analytics a step further by using mathematical and computational sciences and business rules by telling you what will happen when and why by factoring in predictions and suggesting the option and methods to take advantage of these predictions. It also shows the implication of heading down each of the decision paths and the influencing factors without compromising priorities. Prescriptive analytics learns continuously by taking in new data of different types (image, video, audio, email etc), crunching it further and prescribing new actions continuously, thereby improving predictability and suitability of the decision option.
Hari Seldons work used all these types of analytics in his ground breaking, ego crushing revelations he made to the rulers of the galactic empire. Over the last few decades, Knowledge Discovery in Databases (KDD), an interdisciplinary subfield of computer science has gained paramount importance by adopting the same principles. Organisations which plan to discover the wealth in the data will implement information solutions encompassing all the three flavours of analytics to secure a competitive advantage in an environment of intense competition, technological advancement and customers with great propensity to switch loyalties.
The growing challenge of big data has led to an exponential growth in the number of people using data visualization techniques to make sense of the data. Recently, I came across an article in Gizmodo, on a collection of bad visualizations. In an effort to produce attractive infographics and diagrammatic representations of data, such data visualizations seem to be putting at risk, the belief in the value that they bring to big data analytics. The idea of analyzing different kinds of data to gain insight has been around for quite a while now and the logical conclusion would follow that the same holds good for data visualization. It is important to understand the goals of data visualization in order to deliver efficiently its potential value.
While the phrase data visualization might seem self-explanatory, I would like to stress that data visualization serves purposes greater than that which is implied eponymously. Classifying these under two broad categories, good data visualizations serve informatory needs and exploratory ambitions.
Informatory visualizations serve the purpose of reporting where one may measure some underlying drivers for example, customers, prospects, competitors, market opportunity etc over a period of time to identify how the enterprise is aligned. In other words, this helps in visualizing the what and when of the information, thereby conveying complex data in a visually engaging manner.
The exploratory visualizations, on the other hand, help to understand the how and why of the information. These visualizations help identify relationships, correlations, patterns and models in data that were previously unknown. They serve an investigative purpose that will help answer why a particular situation has occurred, predict the risks involved in taking measures to realign to a pre-defined path or go down a new path. By facilitating interaction and engagement through deeper visual drill-downs, it is possible to identify the threads that weave the data tapestry together.
Efficient data visualization is predicated on understanding the drivers that the data underpins. This requires knowledge of the context in which the data was collected and the audience that it is intended for. It is also important to ensure that the data quality and integrity are of acceptable standards before attempting to visualize it and draw meaning. Garbage in will only result in garbage out. Furthermore, while using narratives to illustrate a trend in data is good practice in informatory visualizations, they should be objective in nature so as to not influence the users interpretation. Jim Stikeleather in his article in the Harvard Business Review explains how comprehension by the user is heavily dependent on the semantics of the visualization. Designer bias that includes choice of colors, design elements, chart types, 2D or 3D effects influence the users interpretation and care must be taken to ensure that such features, which are independent of the data, do not compromise the story the data actually tells. Edward Tufte created a formula to quantify a lie factor to show how misleading a graph could be under the influence of some these biases. It is calculated by dividing the size of the effect shown in the graphic by the size of the effect in the data.
Ultimately good data visualisation in the big data era should ensure that the story it presents enables insight driven action. Data visualisations that are created by adopting a clear design philosophy that incorporates the above guidelines will certainly succeed in providing insightful informatory and exploratory data visualisations that encourage action, thereby translating into greater return on investment.