Teaching Computers to “See”

Dog Relaxing on the Beach
Computers are great at identifying patterns, but how do we teach computers what patterns have meaning?


An image contains a wealth of information. It can convey the context of an event, showing what happened, who was involved and what it resulted in. To this end, many organizations have utilized imagery as a key data format in recent years to support processes in improving quality control in semiconductor manufacturing, analyzing tissue samples in pharmaceutical R&D and other sectors. One of the main challenges of images, however, is efficiently extracting the information they contain. Traditionally, researchers and engineers have manually reviewed and annotated images, but this time- and labor-intensive approach is becoming unsustainable, as experiments and operations increase in scale and complexity.

Automating image analysis computationally offers significant promise to tackle this challenge, but it has previously required deep domain expertise to implement effectively. Why is this? The key lies in teaching an image analytics application to understand “meaning.” Consider a program that identifies dogs in pictures: what makes a dog a “dog?” What makes it different from a cat? What if the dog is partially obscured by a bush or is wearing a hat? What if the image is upside down? All of these factors impact the complexity of the application. Until recent advances in machine learning, the design of such image analytics tools was often too labor intensive for widespread enterprise use.

The “How:” Convolutional Neural Networks

Modern image analytics leverage the ability of computers to recognize and manipulate patterns. As data, images are fundamentally collections of pixels arrayed in a square or rectangle. These pixels, taken together, can be combined to form edges, shapes, textures and objects. Subconsciously, this is how we see, building up these patterns into entities to which we ascribe meaning: a bottle, a laptop, a bird. Image analytics applications seek to replicate this process.

Machine learning, specifically deep learning, has made image analytics much more accessible. Previous approaches required manual feature definition, i.e., developers told the program what shapes and patterns to look for and what they meant. Convolutional Neural Networks (CNNs), a form of deep learning, generalized this process, allowing the computer to teach itself what features were meaningful. How is this possible? The multilayered structure of a CNN allows it to break the picture down into more manageable pieces (Figure 1). Convolution layers apply filters to regions of the image to detect features. This results in a stack of new images equal to the number of filters applied in the given layer. Pooling layers take these images and “downsample” them, reducing the dimensionality (i.e., size of the image) while preserving the spatial distribution of the data.

Overview of a CNN for image analytics
Figure 1. CNNs pass image data through multiple layers of filters which identify increasingly complex features. The final layers take these inputs and classify what the image is (in this case, a dolphin).


Therefore, as data travels through the convolution and pooling layers in the CNN, increasingly complex filters can be applied: at the start, filters may identify simple features like edges, curves or corners, but later ones may identify entire objects such as dogs, cars or faces. This overall process is called “feature extraction.” The final layers then classify the results of these steps, weighing the different outputs to determine how “sure” the model is that the cat it is looking at is, in fact, a cat and not a small dog. Here is another benefit for CNNs: the final layers can be used for a variety of decisions, from identifying objects in static images to directing the actions of other programs and devices.

Putting CNNs into Practice

The increasing use of CNNs as the driver for image analytics applications is opening many organizations’ eyes to their possible application in meeting specific business needs:

  • Oil & Gas companies are considering the applicability of image analytics in automating pipeline monitoring for corrosion to identify potential issues sooner and minimize the likelihood of leaks.
  • Auto manufacturers are exploring using image analytics to identify safety QA issues on the manufacturing floor, flagging out-of-order or missing steps on individual car assembly lines.
  • Agriculture operations are adopting applications to monitor crop yields by visually assessing plant health and tying their outcomes to other IoT sensors and programs.
  • Pharmaceutical R&D teams are accelerating experimental throughput by automating the interpretation of the results of large-scale experiments like high-throughput protein crystallization screens or microscopy images.
  • Food & Beverage groups are exploring their use in quality control in manufacturing, assessing product and packaging appearance to help guarantee positive customer experiences.

As organizations consider transitioning their offices, labs and shop floors in light of the COVID-19 pandemic, image analytics tools can help ensure worker safety by tracking the use of masks, flagging work that may violate social distancing guidelines, and monitoring for elevated body temperatures.

In each instance, CNNs can tackle highly varied and specified problems, but the key hindrance to their further proliferation is the scalability of developing these models. Data sets need to be effectively curated and CNNs must be properly trained and tuned, both of which have previously required significant domain expertise. To maximize the impact of CNNs, experts need to lower the barrier to entry, so that citizen data scientists can fully leverage this machine learning technology.

The Imaging Collection of BIOVIA Pipeline Pilot provides prebuilt learners for CNNs as a single component, allowing developers and non-experts alike to quickly clean data and train these models in a code-free environment. Existing models and model architectures developed with Python and 3rd party libraries such as TensorFlow can also be packaged up and shared as drag-and-drop objects. Scientists can then share these objects with colleagues or deploy them as applications, web services or widgets at the enterprise level—accelerating the growth and adoption of these deep neural networks as never before. Check out this short webinar to learn more.


Sean McGee

Sean McGee is the Technical Marketing Manager for the BIOVIA brand of Dassault Systèmes. He has spent his career exploring the application of computational techniques in chemistry, specializing in data science, machine learning, and molecular modeling and simulation. At BIOVIA, Sean oversees the strategic positioning and communication of BIOVIA's solutions for upstream R&D in the life sciences, bulk and specialty chemicals, and consumer goods industries.

Latest posts by Sean McGee (see all)