My inspiration for this post is the wonderful book titled “Innumeracy” authored by John Allen Paulos. In this book, the author hypothesizes that many of us are unable to deal with numbers in the real world and that by understanding the concepts in this book, we can get a clearer, more quantitative way of looking at the world.

Business Intelligence, arguably, is the most quantitative of areas in Information Technology. At a very basic level, BI deals with metrics collected about various business processes. The way the metrics have to be managed and manipulated depends on the mathematical content of these metrics. If that sounds too profound, well, it is intentional and I urge you to read on!

Any Data Warehouse data modeler will appreciate the fact that metrics collected in a fact table have to be understood in the context of the Fact Table grain, viz. A transaction grain fact table has metrics that are to be treated differently than the ones stored at a Periodic snapshot level or as an Accumulating snapshot. Think about Fully Additive, Semi-additive facts and you get the idea.

Similarly, a BI report developer deals with numbers on a daily basis. A good understanding of the numbers (can it be added or averaged or extrapolated) to be shown on a report, is essential to arriving at the right information content and also the correct way to visualize the numbers in question. As a simple example, read Ralph Kimball’s classic article on (aren’t all his articles classics!) SQL Roadblocks and Pitfalls here and we realize that to decipher an article that exposes the basic limitations of SQL in dealing with moving averages (a very common requirement in BI reporting), we need the ability to think mathematically.

Moving on to the realm of data mining, predictive analytics and its ilk, we as BI practitioners are starting to tread on areas that require a solid quantitative mindset. In one of my earlier blogs titled ‘The Esoteric World of Predictive Analytics‘, I had argued that traditional statistics is not enough to make sense of Predictive Analytics, when it comes to modeling Human Behavioral Systems which is what BI applications are all about. More fundamentally, an understanding of probabilities, central tendencies, cause and correlation, normal distributions, regression models, design of experiments etc. is becoming very important for BI practitioners and with sites like this one – Rice Virtual Lab in Statistics , it is quite possible to get a grasp on the fundamentals in a short time-frame.

Let me close this blog with a paragraph from ‘Innumeracy’. John Allen Paulos writes and I quote “In an increasingly complex world full of senseless coincidence, what’s required in many situations is not more facts – we’re inundated already – but a better command of known facts, and for this a course in probability is invaluable…..Probability, like logic, is not just for mathematicians anymore. It permeates our lives”.

BI practitioners, whose lofty ideals, relate to helping organizations make sense out of their customers’ behavior, would do well to give their “Quantitative Gene” a push or shove in the right direction.

Thanks for reading. Please do share your thoughts.

Posted by Karthikeyan Sankaran
Comments (3)
March 14th, 2010

Comments (3)

Karthikeyan Sankaran - April 26th, 2010

Hi Sankar, Thanks for your feedback on the blog. Actually you caught me on this one. What I had written on my earlier blog titled 'The Esoteric world of Predictive Analytics' goes something like as what is mentioned below. Hope this clarifies. Why Traditional Statistics is insufficient? Though the entry into predictive analytics requires that we understand the implications of traditional statistical analysis, statistics by itself is insufficient in the business context. Traditional statistical analysis allows us to understand the general group behavior and is primarily concerned with common behavior within the group – the central tendencies. In business we generally develop models to anticipate human behavior of some type. Human behavior is inconsistent, lacks causality and distributions based on human behavior almost always violate the assumptions of traditional statistical analysis (like normal distribution of data, stability of mean and standard deviation etc). The strength of data mining comes from the ability of the associated techniques to deal with the tails of the distributions, rather than the central tendencies, and from the techniques’ ability to deal with the realities of the data in a more precise manner.

Shankar Vist wanathan - April 26th, 2010

" I had argued that traditional statistics is not enough to make sense of Predictive Analytics, when it comes to modeling Human Behavioral Systems which is what BI applications are all about. More fundamentally, an understanding of probabilities, central tendencies, cause and correlation, normal distributions, regression models, design of experiments etc. is becoming very important for BI practitioners and with sites like this one – Rice Virtual Lab in Statistics , it is quite possible to get a grasp on the fundamentals in a short time-frame." What do you mean by "traditional statistics?" Aren't "probabilities, central tendencies, cause and correlation, normal distributions, regression models, design of experiments etc." also traditional statistical concepts/tools? Am I missing anything here?

Ramachandran Sundararaman - March 23rd, 2010

That was classic too. Read the Ralph Kimball Article too.. To be fair with you I had always been thinking of this limitation, I faced these limitations a couple of times and coded around this with variables.. Good one Karthik. I will read the book..

Comments are closed.