Multi-class vs. Multi-label Classification

Posted by

The most common AI Text Analytics task is Classification. Document classification works for a plethora of use- cases, from sentiment/emotion analysis, intent detection, news categorization, email classification, and so on. There are few business-related processes that do not require a classifier in order to be automated, and there are so many variants of what a “class” should be, that most of the time classifiers are custom-built on the client’s data and based on bespoke taxonomy. In AI Factory 2.0 we have introduced support for multi-label classification, in adition to multi-class.

Read the AI Factory version 2.0 release announcement.

Multi-class and multi-label classification are two different approaches used in machine learning to solve different types of classification problems. Let’s break down each concept with examples:

Multi-Class Classification

In multi-class classification, each data point is assigned to one and only one class or category. This means that the classes are mutually exclusive, and a single prediction is made for each data point. Multi-class classification is used when there are distinct and non-overlapping categories or classes.

Example: Handwritten digit recognition is a classic example of multi-class classification. In this task, the goal is to classify each handwritten digit (0-9) into one of the ten possible classes. Each digit can only belong to one category, so it’s a multi-class classification problem.

Multi-Label Classification

In multi-label classification, each data point can be assigned to multiple classes or categories simultaneously. This means that a data point can belong to more than one class, and multiple predictions can be made for each data point. Multi-label classification is used when the classes are not mutually exclusive, and instances can have multiple attributes or labels.

Example: A common example of multi-label classification is text classification for news articles or blog posts. A single article can belong to multiple categories simultaneously, such as “technology,” “business,” and “sports.” Each label is not exclusive, and an article can fall into any combination of these categories.

Here’s a simplified example:

Suppose you have a set of news articles, and you want to categorize them into topics like “Politics,” “Health,” “Entertainment,” and “Science.” In multi-class classification, each article would be assigned to one of these topics exclusively. So, if an article is about politics, it will be categorized as “Politics” only.

In contrast, in multi-label classification, an article could be assigned multiple labels simultaneously. For instance, an article discussing the intersection of politics and health might be categorized as both “Politics” and “Health” since it covers aspects of both topics.

To summarize, multi-class classification assigns each data point to a single category, while multi-label classification allows a data point to belong to multiple categories simultaneously, making it suitable for scenarios where labels are not mutually exclusive.

Going beyond just Text Analytics, multi-label AI classification has a number of real use-cases:

Multi-label classification is important in various real-world applications where objects or data points can have multiple attributes or labels simultaneously. Here are some additional examples and the importance of multi-label classification in each context:

  • Image Classification with Multiple Objects:
    • Example: In image classification, an image may contain multiple objects or entities that need to be identified. For instance, in a wildlife monitoring system, an image of a forest might contain several different animal species like tigers, deer, and birds.
    • Importance: Multi-label classification is crucial here because a single image can contain multiple objects, and assigning only one label would not capture the full context of the image.
  • Recommendation Systems:
    • Example: In a movie recommendation system, a user’s preferences can span multiple genres, such as action, romance, and comedy. Each movie can belong to one or more genres.
    • Importance: Multi-label classification enables the recommendation system to suggest movies that match the user’s diverse interests by considering multiple genres per movie.
  • Text Classification for Sentiment Analysis:
    • Example: Sentiment analysis of customer reviews or social media posts often involves classifying text into multiple sentiment categories, such as positive, negative, and neutral.
    • Importance: Multi-label classification allows for a nuanced understanding of sentiment, as a piece of text can express mixed sentiments or cover multiple aspects.
  • Medical Diagnosis:
    • Example: In medical diagnosis, a patient’s symptoms can be indicative of multiple diseases or conditions. Each patient’s case may involve multiple diagnoses.
    • Importance: Multi-label classification helps doctors and healthcare professionals consider various possible conditions simultaneously, improving the accuracy of diagnosis and treatment planning.
  • Tagging in Content Management Systems:
    • Example: Content management systems (e.g., blogs, articles) often require tagging content with relevant keywords or topics. A single piece of content can be associated with multiple tags.
    • Importance: Multi-label classification enhances content discoverability by allowing users to find articles or resources that cover a combination of topics or themes.
  • Music Genre Classification:
    • Example: In music genre classification, a song can belong to multiple genres or sub-genres simultaneously, such as rock, alternative, and acoustic.
    • Importance: Multi-label classification enables music platforms to provide more accurate genre recommendations to users who have eclectic tastes.
  • Fault Detection in Industrial Systems:
    • Example: In manufacturing or industrial settings, a machine or system can exhibit multiple types of faults or anomalies simultaneously.
    • Importance: Multi-label classification is critical for identifying and addressing multiple issues in real-time, ensuring the quality and safety of products.

In AI Factory 2.0 you can train both multi-class and multi-label classifiers

In the AI Factory project creation dashboard, at the TRAINING step, if you chose a Classifier as the trainer task, you will have the option to select a “Multi-Class” or “Multi-Label” engine.

The important thing to keep in mind is that the training data structure is the same for multi-class and multi-label, with the main difference being that the same files can be part of multiple classes. You can either manually create the classes in the user interface and upload the training data files for each defined class, or can upload directly the entire folder structure.