Marketing analysts are often faced with choosing a data visualization that speaks to managers and colleagues interested in advanced insights from their data yet requires a better understanding of the statistics that make those insights useful.
The key to reaching those managers without overwhelming them with machine learning minutiae is leveraging a basic graphic as a simple introduction to concepts.
Time series graphs with applicable data can be that introduction. They can be a terrific starting point for discussing machine learning projects.
How Time Series Became a Valuable Analytics Report
Time series graphs are very intuitive. They help you relate a metric to time. Most business managers appreciate this perspective, because it allows them to examine operational performance, using a visual such as a line graph showing how revenue in each quarter has increased or decreased.
Time series graphs are typically seen within social media analytics and web analytics dashboards. No matter which campaigns you’ve worked on, you have likely seen time series data from web analytics solutions or within social media analytics reports.
For example, in a web analytics solution like Google Analytics, you would look at the time series results in a referral traffic report to see what sources are consistently sending traffic to a website.
Discovering how many clicks an ad campaign received or how many website visits occurred in a sales period is a basic question that a time series graph can answer.
The reports are designed to make metric changes with time visually intuitive. You can gain a sense of whether a particular metric is increasing or decreasing within a time period, such as a percentage or a comparison of discrete numbers.
But time series graphs in most user-friendly analytics solutions have a fatal flaw. If you needed to know how sustainable a trend is, statistical details are not immediately visible in the tools.
This means the data spikes and declines appear without showing the influences that caused them. How those surface trends are interpreted make or break major forecasting decisions.
For example, if a type of car in a market is showing sales growth every quarter, then as a car manufacturer, I would want to know if that sales data trend is sustainable to justify investment in a plant or to enter a manufacturing partnership to produce a vehicle for that market.
Establishing sustainability is especially critical for using time series data as training data for machine learning forecast models.
The demand for time series forecasting occurs frequently among retailers like Walmart and Target. Retailers must track product shipment from their distribution centers to their stores, so even a small improvement in demand forecasting of their products can cut costs and enhance product availability as part of the customer experience.
How to Start an Advanced Time Series Analysis
To get a sense of a trend and sustainability, you need more statistical analysis. Doing so identifies the right trend within the data. Time series data always contain noise — dips and spikes, which get incorporated into a templated graph.
Dips and spikes do not always reflect an overall significant trend change in traffic behavior, however. So, you need to separate the data's true trending signal from the noise that masks it.
For example, financial market analysts rely on advanced time series models to extract trends from rapidly changing prices and volume of stocks and commodities.
A statistical time series analysis answers two questions:
Does the trend in the dataset indicate a steady pattern?
Does the trend in the dataset correlate only to the given time period?
The answers are extremely important if the data is being used in a regression- or a machine-learning forecasting model.
You can create this kind of statistical analysis using R programming or Python. In fact, R specializes in time series and geographic data. It treats time series data as a special kind of programming object.
An object is essentially a container that holds the data you want to have calculated and can be read by a given language. In R programming the data object is a mathematical vector.
Time series data is treated as a special version of an R vector that converts different data types into a convenient format for forming graphs and conducting analyses.
To answer the first question, the indication of a pattern, you need to import the data into a tool or program to assess the time series trend from a statistical perspective.
For example, you decompose time series data to separate the data into visuals displaying three derived graphs — one for trend, one for seasonal and one for cyclical trends. The image below displays what this looks like in R.
A decomposed time series allows you to see if there is a true seasonal trend and check the volatility. A decomposed time series graph is helpful for analysis related to seasonal campaigns, which I explain in this post.
For the second question, you address the data dependency on when the data was observed by verifying if the data is stationary or not. Stationarity is a condition in which time is eliminated as a time series data property. Stationarity ensures that trends and properties in the data are consistent — that they can happen in any cyclical behavior regardless of the time period.
To assess this in time series data, you would use an augmented Dickey–Fuller test. An ADF indicates if a dataset has a unit root — a calculation that reflects stationary data. Python and R have a number of dependencies, which can test the data.
The tseries library in R, for example, can run a test function, adf.test(), on a basic time series data. This image shows the result as a p-value, which indicates if the p-value is greater than 0, then it is likely the data has stationarity (usually the results are reported as a hypothesis/null hypothesis if stationarity exists, but I am keeping it simple for this post).
Related Article: How to Create Outstanding Real-Time Reporting in Your Dashboards
Understanding the Noise
R has several libraries that test for stationarity, plus other techniques. For example, there is a library function diff() for converting a nonstationary dataset into a stationary dataset.
You also have a few frameworks for testing time series data. The most popular are autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA).
ARMA and ARIMA are designed to examine the white noise in the moving average trend helping to highlight an existing correlation in the trending data.
Understanding noise within time series data is an advanced tactic that may or may not be a beneficial exploration for a given business question. Managers often glance at trending data just to have an idea of emerging trends for immediate needs.
But today's climate for better supply chain and demand analysis has raised marketer interest in building accurate forecasts that impact activities associated with delivering customer experience.
If you’re one of those marketers, you must dig further into time series datasets to verify data conditions and gain meaningful answers that will make your major business decisions better.