The “Smart” manufacturing revolution is here. It’s proved its effectiveness in Predictive Maintenance, Quality Control, Logistics and Inventory, Product Development, Cybersecurity, and paired with IoT it expands into Robotics, Control Systems, and Digital Twins.
Manufacturers who want to build a data science team and operationalize machine learning models for must build an MLOPs practice. And by must, I mean this is essential, not optional. I’ve seen what happens when a team is built of unskilled engineers and attempts to take shortcuts on the journey to put machine learning into production without MLOPs and Responsible AI.
Why do you need MLOPs?
First, I think it’s important to understand the benefits of machine learning to manufacturers. I’ve already touched on some of the areas where Machine Learning is helpful. But let’s talk about the value that MLOPs brings to your organization.
- It helps to reduce the risk of data science by making the work of scientists more transparent, trackable, and understandable. When coupled with a Responsible AI governance policy, MLOPs can help manufacturing leaders understand the risks of the models running in their environments.
- Generate greater, long-term value. Modern software development has taught us that development doesn’t end with production. Systems must be monitored to ensure they are meeting the needs of the business, even as that business shifts and changes under market pressures. Machine Learning is no different. In fact, models must be reviewed and maintained in some type of regular cadence. Unlike software, that is relatively static in nature, an ML model is code with data. As data changes, so must the model. It is an evolving system and must be observed and maintained even after production deployment.
- Scaling is one of the big problems of ML. It’s fairly easy to monitor one or two models in production manually, but once you scale to hundreds, even thousands of models, you will need systems in place to support the required maintenance of models.
- Maintenance is another thing to consider. Models must be updated. Sometimes many times a day. This means a delivery pipeline from initial hypothesis to production that is fast, scalable, and includes all the required tests and metrics gathering systems needed to monitor and control the lifecycle of the model.
The Machine Learning Lifecycle and Why It’s Challenging
I’m not going to dive into the Machine Learning Lifecycle in this article, but I will point out a few areas where that make it challenging to a manufacturer’s traditional IT department.
- Data is a dependency. Very few ML models are of any value without a powerful data system to provide data for the learning portion of machine learning. Manufacturers often have data. Lots of data, in fact. Many manufacturers have been collecting telemetry data from operational technology for many years. But often that data is locked away in proprietary systems or behind multiple firewalls. There are tools to work with systems on-premises. Many data scientists are happy building experiments on their laptop. But this doesn’t scale to larger datasets or working with streaming data. Ideally, some type of lambda architecture will be needed to gather, prepare, and process the data for the machine learning process. Often, it’s preferred that this system be in the cloud where it’s accessible to entire teams and data engineers and leverage the compute power of the cloud to build and operationalize ML models.
- Domain language issues. Data science brings a whole new set of tools and ideas to the traditional IT department. Data scientists are rarely software developers. They work in Notebooks and use programming languages like R, Python, and Julia. If you’re IT department isn’t used to the tooling and the processes of working with these tools, transferring from a Jupyter Notebook to an inferencing endpoint hosted on Kubernetes might be a challenge for your IT department.
- Again, data scientists are not programmers. There are some who can program, but the code they tend to write is in support of training and testing ML models. It’s not generally written to be robust enough to standup to the type of software requirements demanded from a microservice. Your data scientists will likely need support from engineers who understand the deployment side of data science. You will need a team of people to help support your ML journey.
MLOPs and DevOps
MLOPs and DevOps do have a lot in common. They both focus on some of the same things, so if you already have a robust DevOps process in place in your organization, taking on MLOPs will be less of a challenge. However, if your IT department has never heard of DevOps, you might be in for a harder journey than expected.
- DevOps is a proponent of automation. The idea is to reduce the friction from software development to production deployment by adding systems that automate the manual steps of the software deployment process. MLOPs has these same goals.
- DevOps is a teams practice built on trust within the teams. This includes increased collaboration, communication, and an overall understanding from all the teams of the service life-cycle. Developers understand the requirements and rigors of operations and do their best to bake in what’s needed to support operations members of their team. Operations understand the business value of new features and systems and know how to monitor telemetry and systems to ensure risks are minimized. MLOPs has some of these same ideas, with an even greater emphasis on monitoring the model in production.
- Both MLOPs and DevOps prioritize the concept of continuous deployment, experimentation, observability, and resilient systems.
The big difference between these two systems:
- DevOps tends to deliver code
- MLOPs delivers code and data
Use MLOPs to Reduce Risk
MLOPs can help protect you from risks associated with Machine Learning. As a practice, MLOPs encourages many of the same things that DevOps does, like teamwork, open and honest visibility into work, standard operating procedures, and a break in dependencies from traditional siloed IT operations.
- Personnel dependency risks. What if your data scientist leaves the organization? Can you hire someone to take over that role?
- Model accuracy can reduce over time. Without the systems in place to monitor models and the system in place to quickly deploy new models, you might find that you’re systems are compromised because they are delivering bad predictions or they are responding incorrectly to certain events.
- What if there is a high dependency on the model and it’s not available? When systems are dependent on the accuracy of a model and that model isn’t available, could it bring production to halt? Or worse, put people’s lives at risk? These are all considerations that must be made when evaluating putting an ML Model into production.
Taking on MLOPs
MLOPS is not free. There are operational costs associated with the practice. This is not a cost you can avoid. Too much depends on the accuracy of your models, especially in a production environment to take on the risks without the assurances that a good MLOPs practice brings to manufacturers.
Here are the important things to remember:
- Pushing a model to production is not a final step. It’s one step in the life-cycle of a model.
- Monitor the model performance and make sure it’s meeting the accuracy requirements expected
- Use Responsible AI practices
- Again, MLOPs is not optional, nice to have, or an afterthought. If you are releasing models to production, you must start an MLOPs practice.