Big data analytics: what are you, descriptive or advanced?
Recently, I attended an online webinar, hosted by SAS. The topic was, five essential data management practices for analytics. But it was their explanation of the two types of analytics I found most interesting. The presenter Lisa Dodson, Manager, Data Management SAS, Americas Technology Practice, commenced with clarifying the term analytics. Lisa suggested, that it was an all in compassing and broadly used term. For the purpose of the presentation Lisa would put the term analytics into two groups.
The first was descriptive analytics, this refers to analytics of historical data. How did we performed now and in the past? How well did we do last year, or last quarter, more importantly, why? But as the questions become more predictive in nature we then move in to the Advance Analytics.
In advance analytics, we ask questions in a more predictive manner. Like, how will we do next month, or year? This is applying the results of the past to be able to predict the trend into the future. From these results, we would be in a position to determine our next course of action, and how we would address the indicators identified. Descriptive analytics is designed to look at the past results, identifying trends. Advance analytics looks to move that data forward in to the future, projecting the trends and providing potential insights.
Lisa indicated, “That frankly there is a paradigm shift that is happening from a data strategy perspective. As organisations move from a descriptive in to advance analytics”. Lisa indicated, close to 85% of her conversations with clients, generally focused around the business and IT. People using advance analytics would ask IT, “I need data to answer some problem questions on customer churn. I need to understand which customers will leave, so I need the customer data?”, next IT asks the simple question, “Can you be a little bit more specific please?” the response being, “No, I can’t be specific as yet”. The conversation stops as both sides await a response. IT is wondering how they can supply the data if they don’t know what the user needs. The analyst is wondering what the issue with their request is.
The presentation then moved into the paradigm change. That change being in the thought process of advance analytics people and traditional IT. The paradigm shift is moving IT from descriptive analytics, the historical process of data analytics to advance analytics.
In descriptive analytics, IT worked with the data. There processes followed a defined pattern that of design, extract, transform, load, and validate then refresh. This was then repeated from the extract layer as many times as required. The results were then used to produce static reports that had been predefined using a template during the design phase. The business would then refresh for each month or quarterly update of that report. This type of analytics is about the destination of the data. When IT works on this model, a large portion of time is spent on design. IT understand the requirements of the data, they know the reports to publish and where. IT is able to do this as they have an understanding of the organisation, its historical data in nature, usually reporting after the fact.
Once the model is built much of the time is spent extracting data and transforming it. They may summaries the data or aggregation. They place that data within the defined structure produced in the design phase, at the same time validating the data as it comes across. Once the data is in place, all that is needed is to refresh the data as required, and the delivery of the results as reports.
Once in place, change is minimal. Occasionally, new sources or updates may be required. The process is fairly rigid and well governed, the reason being the design phase has driven the environment. Ultimately it is about where the data is going, and what we do to that data to obtain our reports.
Things start to change with advance analytics. The terminology used is similar, extracting, loading, transforming and data quality. In processing the data for advance analytics, it is more about the data and its journey.
When working with data for advance analytics, 80% of a work is preparing the data, 20% is actually doing the specialist work. Lisa indicated that she had often heard SAS chief, Dr Jim Goodnight often say, “The analytics is the easy part, it is always the data that is the hard part”. The reason why, is the preparation of the data, cleansing, validating and structuring. Like every great chef, the preparation of the ingredients is the important part to delivering great outcomes.
The quality of the model, is influenced and determined by the preparation that the statisticians carries out. This preparation is what separates a good modeler from an exceptional one. The trouble is that 80% of the work, is work that a statisticians should not be doing. A good analogy is in hiring a top classed surgeon for a hospital. You pay a high price for that surgeon and their specialty. You don't want them spending time grabbing their own tools, scheduling the OR, or cleaning it up. You want them to focus on their expertise. From a data strategy perspective, how do we reduce some of the 80%, maximising the specialised skills of the data scientist, where they will give us the greatest return on our investment.
This paradigm shift, is bringing the drive for business to be analytically driven. Their understanding of this paradigm shift, from a data mining perspective is important. We need to bridge the gap between the advance analytics section of the business and IT. IT needs to understand the requirements when dealing with advance analytics and the management of that data. When we talk about data management for analytics we are talking about all analytics, both descriptive and advance. In support of decisions that scale, or open source integration, or streaming analytics, or approachable analytics, these are all of the themes that SAS have in regards to the solutions they are able to provide.
In understanding where you sit in the spectrum of data analytics, you are then able to appreciate the true shift or benefits that can be seen by an organisation with analytics, both descriptive and advanced. First you need the data, next you need the ability to explore that data. Once you discover which models will work, how do you incorporate those decisions and insights into what you do on a day to day bases?