What Is Big Data?

Last year – yes really it was only last year – a new IT buzzphrase came to the fore, “Big Data”. Whilst those working with the analysis of very large data sets had been using the phrase for a while it was less than two years ago that it started to invade the public consciousness. As with many ‘new’ IT concepts understanding is  vague about what big data is (an acquaintance once collated 37 different definitions of another IT buzzphrase “The Cloud”), but basically it refers to a collection of data so big that it is very difficult or impossible to “mine” with the technology we currently possess in our organisation. By “mine” we mean digging into the data to extract value from it, commonly called “data mining”.

 

Obviously access to bigger technology is a key tool in big data, so what makes big data worthwhile? In a word, analysis. Big data is not about obvious facts and trends, it’s about digging deeper. A simple example, if we know that on average 10,000 bus rides are taken each day we have a financial case for providing a bus service and a crude measure for bus service performance. If we can go further and say how many people made journeys between which stops at which times over a prolonged period - say 3 years (10 million bus rides) then with analysis we can in theory design the ideal bus service with the right capacity in the right place at the right time accounting for weekly load patterns and seasonal trends. Delivering the ideal bus service might be a challenge, but at least we would have a better idea of what it would look like. Analysing 10 million records by time, day of week, month, start & end of journey etc. would be quite a lot of work - and therein lies the heart of the Big Data problem. Imagine what it would be like trying a similar analysis for London Underground which averages 3.5 million passenger journeys per day.

Big data is essentially about large scale statistical analysis, and the ability to fragment large bodies of data into small groups of similar behaviour. We can apply it to customers - the big supermarkets have been doing it for many years with their loyalty card schemes, to diseases and health problems, to transport, TV viewing patterns, politics, the weather, advertising performance, energy consumption, stock market performance - the list of possible applications is extensive.

What we get from the analysis is detailed insight - a better understanding which we may use for design of new products and services, forecasting of demand, and the general optimisation of how we provide what we provide to whom - which is why big data is so relevant to business.

Going back to the introduction momentarily, I said that big data is that which is “very difficult or impossible to “mine” with the technology we currently possess in our organisation”. For a small organisation with one or two small computer servers big data can actually be quite small, but one of the key factors which has changed the big data landscape and brought it to prominence is the cloud - the availability of large scale computing on demand via the Internet. The cloud has meant that organisations large and small have been able to address big data problems which previously were out of their reach.

An example, earlier this year I was asked by the new owners of a small business (c. 10 staff, £1.5M turnover) to analyse their data to help them test their new business strategy; they thought they needed to invest substantially in their web platform to facilitate more Internet sales. The company had one small server and 10 years of sales data. Moving this data into the cloud permitted analysis which revealed that the huge majority of their web sales, which made up the bulk of their orders and workload, were so small as to be loss-making, mid-range sales largely came in by phone or email from corporate account customers and were comfortably profitable, and 40% of the company’s profit was generated by 1% of sales orders from large key account customers. Unsurprisingly the new owners reworked their business strategy to deprecate small web sales and invested in key accounts salespeople. In the scale of things this was hardly headline-grabbing big data, but use of cloud computing made it easy to produce analyses which would have been very difficult on their own small IT system. The analysis process also revealed product trends which led the business to refocus their offerings.

So is big data applicable to your business? Experience shows that it can help almost any business that has access to a decent amount of transactional history, because it provides the means to create granularity, or detail, out of history - essentially big data is about the creation of small nuggets of information allowing us to target our activities much more accurately. Knowledge is power and those businesses which have embraced big data are demonstrably more successful than those which have not.

Obviously the exploitation of big data depends on the availability of data to start with, and here again the Internet has made a huge difference, through it we are able to access more data than ever before and that data is growing rapidly. The “Internet of Things” - devices which are not computers but are connected to the Internet - will create another step change in the amount of data available for analysis.

A last word about scale; the analysis for the little company I referred to cost a few hundred dollars of cloud computing power bought from Amazon - a small investment for the strategic decision-making it enabled. The UK Met Office has just announced that it is spending £97 million on a new supercomputer with 480,000 processors so that it can improve UK weather forecasting. Big is relative, to a flea a dog is huge, to the Sun our Earth is little more than a marble on the cosmic carpet. Big data is similarly relative, it is about the creation of new information which was not previously possible for your organisation.