Applying Data Science in Manufacturing: Part I -Background and Introduction

Ryan Monson
4 min readJun 13, 2020

Data is like garbage: you better be sure what you’re going to do with it before you collect it- Mark Twain

This is a four part post:

-Part I — Background and Introduction
-Part II — Batch Processing: Methodology and Lessons Learned
-Part III — Continuous Processing: Methodology and Lessons Learned
-Part IV — Summary and Conclusions

After 30+ years with various Manufacturing organizations as a New Product Development, Process and Quality Engineer plus a stint as an Organizational Development Consultant I temporarily left industry to pursue education in Data Science.

My course of study included 20 guided projects, assignments where a public data set was analyzed to answer a question or solve a problem. None of the guided projects were associated with Manufacturing. That’s understandable, given the conservative culture in that sector of the economy. Data is closely guarded. Records are still completed with pen and paper. New technologies are cautiously employed, and only when necessary, rarely as an opportunity.

Manufacturing processes were data rich environments long before Internet connectivity made finance, retail, health care, etc. data rich environments. Manufacturing facilities have been electronically gathering and storing hundreds of thousands, even millions, of measurement results daily on process temperature, pressure, flow rate, rpm, amperage, etc. for many years. Those measurement results are used for process control. Boundaries are established for the measurement results, and most process control systems are continuously attempting to hold the results at pre determined set points.

In my career occasionally we’d have output from the manufacturing process that did not meet requirements. Problem investigation always included looking at the process measurement results. We didn’t know what to do with these large amounts of data. All of our training and education was on statistical inference from small samples. We’d never been taught how to analyze large datasets.

About half way through my Data Science studies I had an epiphany: is it possible to use process measurements for more than just boundary control? Modeling using machine learning methods may allow parameter requirements to move from being bound within a range to being bound by their relationship to each other. For manufacturing processes those relationships were previously established through theoretical chemistry, physics and laboratory experimentation but, from my experience, the interrelationships are never transferred to the process control system.

For example, suppose in the laboratory it’s learn that high temperature at step 1 in the process leads to increased product impurity (but still within specification), but the increased product impurity can be removed by low temperature at step 4. When the process is eventually transferred to manufacturing, the control system will establish an acceptable measurement range for temperature at step 1 and a range for temperature at step 4, but it will not establish a relationship to reduce impurity variation(i.e. if the temperature at step 1 is at the high end of it’s range, the system will not communicate to step 4 to operate at the lower end of it’s range).

I hypothesized that by establishing these relationships between variables within the process control system process variation could be reduced. It seemed like modeling using machine learning techniques could establish those relationships, which could be programmed into the process control system.

I’d already experienced significant benefits of process variation reduction in my career. In one case the variation reduction had no capital costs and minimal labor costs yet resulted in a large throughput gain. The ROI was so impressive and obvious no one bothered to calculate it.

After completing my Data Science coursework with I found an alloy manufacturing dataset on Kaggle. The dataset owner was looking for the ability to predict alloy grade. The process had twenty seven manufacturing process parameters. I felt this would be a good opportunity to practice my newly acquired knowledge on a Manufacturing problem.

I knew there would be false starts, dead ends, thinking mistakes during this analysis. It would be different from my coursework projects. There was one principle, however, I knew was important when I got done: The model parameters must make sense to the engineers and operators from the physics and operational perspective. The model had to make sense when compared to physics theory. The model had to be “operational”: any parameter adjustments suggested by the model had to be doable. I’d seen engineers offer recommendations based on their analysis but were not doable. Credibility was shot, and process improvement opportunity back burnered. The model would likely suggest actions contrary to engineer and operator preconceived notions and intuition. Overcoming resistance to change would require their buy-in.

In Part II I’ll discuss the analysis process used on the alloy manufacturing dataset and the lessons learned.

The author currently resides in Pittsburgh, PA and has worked in the Aerospace, Semiconductor, Textile and Medical Device Industries. He can be reached at



Ryan Monson

Engineer who writes on Data Science and social issues