By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Blog
Blog

Automatic ML Retraining with Digital Twins

The Need for Machine Learning in Streaming Analytics

We depend on a wide range of complex systems — transportation networks, building sensors, security systems, power grids, and much more— to keep our modern world moving smoothly. These systems generate countless streams of data that managers must continuously analyze to spot emerging issues and create effective responses.  

As discussed in previous posts, real-time digital twins offer a powerful software technique for tackling this challenge. Hosted in memory on a scalable computing platform like ScaleOut Digital Twins™, thousands or even millions of digital twins can track individual data streams and look for issues with the corresponding physical data sources. As messages flow in, each digital twin combines them with known information about a particular data source and analyzes the data within a few milliseconds. It can immediately alert personnel when a problem arises.

However, writing streaming analytics code that performs streaming analytics is easier said than done. For example, consider a generator that generates telemetry with multiple parameters, such as RPM, temperature, and voltage. Subtle combinations of values induced by a variety of underlying causes can indicate emerging issues, and it can be almost impossible to spot these dynamics using hand-written analytics algorithms.

Luckily, machine learning (ML) algorithms can tackle this task very nicely. As described in an earlier blog post, a type of ML algorithm called a binary classification algorithm can identify anomalies in telemetry after being trained on many examples of real-world data. So, we can select a binary classification algorithm, train it on data collected from previously collected telemetry that has been classified as either normal or anomalous, and then deploy this algorithm in real-time digital twins to analyze a set of generators:

Using machine learning for anomaly detection

Note that each digital twin independently runs an ML algorithm and examines a telemetry stream from a specific generator. When it detects an anomaly, it can send signal code to alert to personnel or a message back to the generator to shut it down if necessary.

The ScaleOut Digital Twins platform provides an ML Training Tool that lets application developers train various binary classification algorithms supplied by Microsoft’s ML.NET and then deploy them as part of a C# digital twin application. (A companion Model Development Tool provides the same functionality for rules-based digital twins.) In addition, users can supply trained ML algorithms from TensorFlow and deploy them as well.

Here is an example of the workflow used to train an ML algorithm using these tools:

Workflow for training real-time digital twin model

In this workflow, the user supplies an initial training set, which is a list of vectors, each containing input values and a binary classification of normal or anomalous. The tool then trains a candidate set of ML algorithms so that the user can evaluate the results and select an algorithm for deployment. Lastly, the user deploys the algorithm to the ScaleOut Digital Twins platform along with C# or rules-based code that processes incoming messages and invokes the ML algorithm within the digital twins.

Automatic ML Retraining During Live Operations

An ML algorithm relies on its training set to teach it how to detect anomalies. Once deployed to analyze live telemetry, the algorithm will likely encounter new situations not covered by the initial training set. In these cases, it may either fail to detect anomalies or generate false positives. An ML algorithm should be able to learn as it gains experience in the real world so that it can continuously improve its performance.

That’s what online, automatic retraining can do. When analytics code running in digital twins invokes an ML algorithm and detects invalid classification responses, it can capture the inputs and generate new training data. The ScaleOut platform provides an API that this code can call to save a new vector of training data containing the inputs and the corrected classification.  

Of course, analytics code must be capable of detecting when its ML algorithm generates invalid responses. It can use dynamic state information held with digital twins to perform reasonableness checks on ML results and catch suspect responses. For example, if the ML algorithm detects an anomaly and all input parameters are well within limits, code can classify this response as a false positive. Likewise, if one or more input parameters exceed their limits and the ML algorithm fails to detect an anomaly, code can reclassify the response as anomalous. In addition, code might use other heuristics to evaluate ML responses. The ML algorithm and digital twin code work together: while the ML algorithm attempts to detect subtle issues, code monitors and helps retrain.

All digital twins contribute to building a new training set that extends the original one. Once enough data has been collected, ScaleOut’s in-memory computing platform can automatically retrain the algorithm and redeploy it without affecting ongoing operations.

Steps in autoatic machine learning retraining

The ScaleOut Digital Twins hosting platform combines the initial training data with dynamically generated updates and then applies the same training process that the ML Training Tool used to initially train the ML algorithm. For algorithms that require manual retraining, users can periodically download updates, retrain their algorithms, and then upload them to the platform during live operations.

Digital twins seamlessly update their logic to incorporate a new ML algorithm. As they process incoming messages, the ScaleOut platform automatically updates the algorithm they use; they just switch to a new algorithm between message processing steps.

Summing Up

ML algorithms hosted within digital twins provides an important new dimension to streaming analytics. They enable digital twins to individually monitor large numbers of data sources and continuously analyze incoming telemetry for subtle anomalies that need attention. Automatic retraining takes this capability to a new level by allowing digital twins to continuously improve their ability to detect anomalies with ML as they gain real-world experience. Especially with automatic retraining, ML amplifies the power of digital twins to ensure that large, complex systems run smoothly.

Download a free evaluation version of ScaleOut Digital Twins and try out our ML features today.

About The Author

William Bain, CEO at ScaleOut Software

Dr. William L. Bain is the founder and CEO of ScaleOut Software, which has been developing software products since 2003 designed to enhance operational intelligence within live systems using scalable, in-memory computing technology. Bill earned a Ph.D. in electrical engineering from Rice University. With over a 40-year career focused on parallel computing, he has contributed to advancements at Bell Labs Research, Intel, and Microsoft, and holds several patents in computer architecture and distributed computing.