surfer riding through the center of a  wave
PHOTO: Jeremy Bishop

Marketers are increasingly using machine learning technology to help implement campaign strategies. But the introduction of machine learning can raise programming concerns, concerns which many marketing professionals may only have a surface-level understanding of. If you are a marketing professional who finds themselves in this situation, understanding pipelines is a good place to start learning without being overwhelmed.

A Beginner's Guide to Machine Learning Pipelines

Pipelines are process steps necessary in building a machine learning algorithm. "Pipeline" is used by developers to describe the series of events which feed one into the other from source code and on into a production environment. If you research software development you will likely see pipelines labeled for many programming services. For example, Azure pipelines connect Azure cloud services to repositories like GitHub and Bitbucket. Solutions like this are meant to establish an integrated environment for development workflow and offer specific features that work with other related cloud services. For machine learning, pipelines address the statistical planning for data and parameters for the produced model.

A machine learning pipeline generally consists of several steps, but if you are just starting out with machine learning, it might be easier to think of every step in three parts.

It starts with data acquisition. What data do you need for your model? What source will you pull your data from? After determining this, you have to set up a connection to the data lake or database where the data is housed.

Next it's time to make some decisions for the data and model. The data is processed statistically, with tasks such as removing outliers and potentially substituting a mean for a few missing data points. Transformation choices set lines of code that address how the model reads the data. This means row preparation, column preparation and data value changes. For example, you may have to conduct a one hot encoding step to change categorical data into numeric values, eliminating text from the observations entering the model. 

The final steps examine model performance. The process consists of running the training and test data to establish model specifications. It also involves verifying the accuracy and precision of the results, with the choice to optimize model parameters as needed.

Related Article: Here's Why Every Marketer Needs R Programming

How Marketers Can Help

A marketer may not know enough to understand Python or R programming to create a pipeline — many of those tasks should be accomplished in collaboration with a developer or IT professional. But a marketer can assist in key decisions that feed these programs, such as establishing the initial data variables and their data sources. In a previous post on dimension reduction, I explained why selecting too many variables can make a model untrainable. Marketers can therefore start by eliminating uninteresting data.

Another point where marketers can help is in determining the model documentation. Documentation establishes the instructions for running functions at a specified time. YAML is a markup text language that uses key-value pairs to instruct the model with initial parameter conditions. The pairs are written similarly to that in a JSON and XML file. Most of the current IDE (integrated developer environment) solutions like Visual Studio Code can be used to create YAML files. They are essential for debugging nested key-values, which are common for complex applications. Training sites like Tutorials Point can teach the basics and show examples.

Pipelines can reveal what marketers should know to better manage their machine learning model alongside more technical professionals. The need to collaborate on technical teams is fast establishing a marketing ops discipline, a complementary discipline to MLOps and DevOps. With the rising demand for tech-savvy marketers, professionals will do well to add pipeline tasks to their professional development plans in the days ahead.