Whether in storage, production or customer service – completely different business processes all involve a use of images which need to be analyzed and evaluated. However, manual evaluation of these images is time-consuming and error-prone. These procedures can be automated with the help of computer vision, i.e. machine analysis and processing of images. Thanks to highly mature methodology, machines are now able to carry out even complicated analyses.
Table of Contents
What Now - Dog or Cat?
If a picture of a cat and a dog are placed next to each other, even small children can distinguish between the animals. But which attributes make this possible? What is the set of rules for differentiating between the two animals?
Dogs are bigger than cats <-> But a Maine Coon is bigger than a Chihuahua
Dogs have more pointed muzzles <-> What about the pug?
Cats have thick whiskers <-> But even without whiskers, a cat remains a cat
Although this is a trivial issue which a human can answer completely intuitively, creation of a robust set of rules here requires an enormous amount of effort. But is such a set of rules even necessary for a machine to be able to analyze images fully automatically?
How Machines Learn
Instead of going through an elaborate process of defining rules, we can give machines the opportunity to discover or learn underlying patterns independently. The learning process can be divided into four basic steps:
Labeling: Machines learn on the basis of a data set. First of all, many sample images must be collected. These are then labelled, i.e. each sample image is manually assigned to one of the two classes (dog | cat). It is important to select sample images which are as representative as possible. If all cat pictures are taken in living rooms, but all dog pictures in meadows, the machine might inadvertently learn the difference between the environments instead of the animals.
Splitting: We then have to split our labelled dataset into two independent sets – training and testing. The training data set will be used to train our model. The test data set will be used subsequently to determine whether the model has really learned to distinguish between dogs and cats – and has not simply remembered all the images in the training data set.
Training: We now display all the images of our training set within a model, and inform it about the correct class of each image. Based on this classification, the model is then able to find patterns which can be used to distinguish between dogs and cats.
Testing: After the model has been trained, it is shown the images of the test data set – but without any classification. The model must then decide whether each image represents a dog or a cat. The model's performance can subsequently be evaluated by comparing the model's decisions with the actual classes.
What Can Models Do?
The example above involves a simple application of computer vision. The model simply has to decide what can be seen in an image. This is a single-label image classification which only determines whether the image is that of a dog or cat. However, further model types are available to resolve a variety of issues:
Anomaly detection: The model learns how images normally appear and can detect deviations from them. "Negative examples" are not needed here for training.
Image classification: As mentioned earlier, the model recognizes the contents of an image. A distinction can be made here between single-labels and multi-labels. In the case of multi-label problems for example, an image displaying a dog and a cat can be assigned to multiple classes.
Object detection: The model detects the position of objects and assigns each object to a class. Accordingly, it indicates the coordinates of a rectangle enclosing a cat in an image, for example.
Image segmentation: The model classifies each pixel. This allows a cat to be clearly distinguished from a background, for example.
Instance segmentation: Object detection is first carried out to identify the cat and its position. In addition, the cat's outline is determined with pixel precision.
These methods differ not only in terms of issues they resolve, but also in terms of the effort involved in model creation. The labelling effort also depends largely on the selected model type. For each use case, it is important to find the model type (or a combination of several) best suited to resolving the involved issue.
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
Neural Networks for Tabular Data: Ensemble Learning Without Trees
Neural networks are applied to just about any kind of data (images, audio, text, video, graphs, ...). Only with tabular data, tree-based ensembles like random forests and gradient boosted trees are still much more popular. If you want to replace these successful classics with neural networks, ensemble learning may still be a key idea. This blog post tells you why. It is complemented by a notebook in which you can follow the practical details.
Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.
Polars, the Pandas challenger written in Rust, is much faster, not only in executing the code, but also in development. Pandas has always suffered from an API that "grew historically" in many places. Polars is completely different: it ensures significantly faster development, since its API is designed to be logically consistent from the outset, carefully maintaining stringency with every release (sometimes at the expense of backwards compatibility). Polars can often easily replace Pandas: for example, in Ibis Analytics projects and, of course, for all kinds of daily data preparation tasks. Polars’ superior performance is also helpful in interactive environments like Power BI.