👂 The Art of Listening

Putting lithium grease into wheel bearing

There is quite a bit of chatter about predictive maintenance today. Certainly, it seems that many people are in on this conversation and trying to distil the valuable information from the hype.

Fortunately, we can use our listening skills to filter out the parts of the conversation that are meaningful.

Recently, an academic group1 made some efforts in this direction, using ubiquitous Deep Learning methods to classify different fault types in fan bearings equipped with vibration sensors. They made use of an online data set,2 which “seeded” the bearings with faults of different sizes at different locations. Then they collected sensor measurements and made them openly available in MATLAB binary format.

The authors of 1 used 2 to do standard “data science” work in building a machine-learning classifier.

While the data set is primarily useful for classifying the faults, we can do a thought experiment that explores a time-aspect of the data, explicitly asking,

Let’s imagine the fault size growing over time; can we then predict it, given the sensor data?

This is straightforward using BIOVIA Pipeline Pilot, which integrates with the two open-source, data science platforms that are most in use today: Python and R.

In fact, this particular problem reveals one very nice feature of “component-based” tools such as Pipeline Pilot – namely, that we can use both Python and R in the same data pipeline.

For this problem, Python’s .mat file reader is very handy, and the pandas data manipulation and analysis package is excellent, in the opinion of this author, for feature engineering.

On the other hand, R has a very nice Hoteling package, which can help you determine the “T-Health” of a device. R also has the workhorse linear regression (lm()) method, which we used for our own work.

Feature Engineering

As stated above, we used scipy.io to read in the .mat files (there are a few dozen), and then pandas’ grouping capabilities to calculate simple statistics over the sensor data: mean, min, max, skew and kurtosis. Sensor output at the “drive end” of the shaft was 12,000 readings per second, and we calculated averages over 1 or 2 seconds, depending on the length of the measurement (typically 10 to 20 seconds).

The authors2 found that a frequency-space analysis gave a much higher classification accuracy than a straightforward time-space analysis.  However, we did not follow their procedure of FFT and then Principal Component Analysis, although these methods are available in Pipeline Pilot.

The results of this analysis were a pandas Data Frame with Fault Size, Motor Load and the statistics.

Statistical Analysis

When we had the “sample statistics” above, we moved the data from Python to R seamlessly using Pipeline Pilot.

Once in R, we explored various linear and nonlinear models, predicting the fault size versus the motor load and statistics. As expected, the motor load was insignificant, as were any interaction terms involving this variable. Also unsurprisingly, there was a significant correlation between the max (or min) of the vibration and the fault size. More surprisingly, this was nonlinear with a significant curvature term.

Figure 1 below shows the various models:Models

Here the filled circles are the points, which are color-coded according to the motor load. The three different curves are three different model forms, the “best” of which, the red curve, gives an adjusted R-squared of nearly 0.8, which is quite useful for predictive work. Interestingly, the kurtosis is significant in this model.


The modern data scientist confronts a plethora of different “platforms,” but frequently wants to use more than one of them. With BIOVIA Pipeline Pilot, this is easy.

Our example shows that, using both Python and R, you can predict the size of a fault in a bearing with reasonable accuracy simply by listening carefully to the vibrations.

This permits true “predictive maintenance” approaches, maximizing the uptime of assets, while also minimizing operational maintenance efforts.


1 “A Novel Multimode Fault Classification Method Based on Deep Learning,” Funa Zhou, Yulin Gao and Chenglin Wen. Available open-access via https://www.hindawi.com/journals/jcse/2017/3583610/
2 Case Western Reserve University, Bearing Data Center. Seeded Fault Data. Available online at https://csegroups.case.edu/bearingdatacenter/home

David Nicolaides

BIOVIA, Dassault Systèmes at Dassault Systemes BIOVIA
Principal Scientist at Dassault Systemes BIOVIA working to ensure the success of both BIOVIA and its customers whose main focus is software to support the process of efficient experimentation, and management of the knowledge which this generates. A secondary focus is integration - pulling together software, data and people into novel situations which generate value.

Latest posts by David Nicolaides (see all)