It’s AutoMagical! How does the AI Factory hyperparameter optimization work

Posted by

A new and exciting feature was added to the AI Factory version 2.0 release: The AutoMagical Settings switch for the Classifier trainers. With just a click, Factory will automatically (and magically!) determine the best hyperparameters for the training process to obtain the best AI classifier from your data.

Read the AI Factory version 2.0 release announcement.

Let’s see how this works

Hyperparameter optimization is a process used in machine learning to determine the best set of hyperparameters for a particular model, in our particular case, to fine-tune a classifier’s performance

In simple terms, a classifier is like a decision-making tool that can analyze data and make predictions or decisions based on that data. To make these decisions, the Classifier Trainer relies on certain settings, which are called hyperparameters:

  • Epochs. Think of your business receiving an annual report. Each time you go through this report from beginning to end to understand its contents and categorize it, that’s like one “epoch.” If you analyze and categorize the report 10 times to refine your understanding, you’ve gone through 10 epochs. In the context of the classifier, an epoch means the model has processed and learned from the entire set of documents once.
  • Early Stop. Imagine your team is categorizing documents and they notice that after a certain point, repeatedly reviewing the same documents isn’t leading to better categorization—it’s just making the process more complex. They decide to stop the review process because it’s no longer beneficial. This is “early stopping” in machine learning: if the model starts to overfit or doesn’t gain a better understanding after a certain point, we stop its training.
  • Batch Size. Instead of your team reviewing and categorizing each document one by one, they decide to bundle documents together and categorize them in groups. This speeds up the process. The size of each group (or bundle) is the “batch size.” In machine learning, it means processing a set of documents at once rather than individually.
  • Learning Rate. As feedback comes in about the accuracy of the categorization, the team decides how quickly to adjust their categorization strategy. If they change their approach too quickly based on a few pieces of feedback, they might become inconsistent. If they adjust too slowly, they might not improve their process efficiently. The learning rate in machine learning dictates how quickly the model adjusts its classification strategy based on the documents and feedback it receives.

In the previous version of AI Factory, these hyperparameters had default settings and the user could manually modify them. The default settings were usually good enough for most machine learning projects, as they were defined by Zetta Cloud based on numerous previous experiments. Manual changes to these parameters could provide different results, so experimentation with these hyperparameters was expected.

How does the AutoMagical setting work?

Think of hyperparameters like dials on a machine that controls how well it performs. Each dial can be turned to different levels, and the combination of these settings can affect the machine’s overall performance. Hyperparameter optimization is the process of finding the best combination of settings (dial levels) for the classifier to perform at its best. In a business context, this could mean improving the accuracy of predicting customer preferences, enhancing fraud detection, or optimizing marketing strategies. By fine-tuning these settings, businesses can make better decisions, ultimately leading to increased efficiency, customer satisfaction, and profitability.

When the AutoMagical Settings switch is turned on, Factory will find the best combination of settings (dial levels) for the classifier to perform at its best. The values of settings that make into one combination are selected by Factory from given value intervals. Factory uses a smart way (PySOT) to pick which combinations to try next, focusing on those that seem most promising based on the surrogate model’s predictions and the results so far. It’s like prioritizing the most promising business decisions based on previously learned business experiences.

How many combinations of settings should Factory try in one optimization session? “Number of samples” refers to the number of different sets of hyperparameter values that will be tested by the optimization algorithm. A smaller number means faster optimization but potentially less accurate results, while a larger number increases the chances of finding the best hyperparameters but requires more computational resources. The goal is to find a balance that offers a good trade-off between accuracy and efficiency.

Be aware that “magic” comes with a cost: AI Factory will need more time and resources to try the number of possible best combinations until it finds the best of the best.

Behind the hood: the PySOT algorithm

The Machine Learning hyperparameter tunning is done using the PySOT algorithm: Python Surrogate Optimization Toolbox for global deterministic optimization problems.

How does PySOT work? Imagine you’re a scientist trying to find the best formula for a new medicine. There are many ingredients you can mix in different amounts to make the medicine. The challenge is finding the right combination that is most effective.

Testing every single possible combination would take a long time and a lot of resources. This is where PySOT, or the Python Surrogate Optimization Toolbox, comes in.

  1. Surrogate Model: At its heart, PySOT uses something called a “surrogate model”. Think of this as a simplified version of the real world. Instead of conducting expensive and time-consuming experiments in the real world, you can test things quickly in this model to get an idea of what might work best.
  2. Sampling Strategy: Initially, you might randomly test a few combinations in the real world to see how effective they are. These tests give you initial data points.
  3. Predict & Explore: With those initial data points, the surrogate model makes educated guesses about where the best formula might be. It might predict that certain combinations are more promising than others. You then test these predictions in the real world to see if they’re accurate.
  4. Update the Model: Based on the results of your new tests, the surrogate model is updated. It becomes smarter and better at predicting which combinations might be effective.
  5. Iterative Process: You keep repeating this process: using the model to predict promising combinations, testing those predictions in the real world, and then updating the model based on the results.

The goal is to find the best combination with the fewest number of real-world tests.

A key concept in PySOT is the “trade-off between exploration and exploitation”. “Exploration” means trying out new areas or combinations you haven’t tested before. “Exploitation” means diving deeper into areas you already know are promising. PySOT balances these two strategies to find the best solution efficiently.

In summary, PySOT is an optimization method that uses a surrogate model to predict promising solutions. By alternating between real-world testing and updating the model, it aims to find the best solution with fewer tests.

One response