ML adaptation for RNN and LLM models in AI Factory 2.0

Posted by

The AI Factory no-code Machine Learning dashboard offers training capabilities for Classification and Named Entity Recognition. Each option relies on different algorithms:

  • Entity Extractor XT (Entity Extractor – extended) using Deep Learning with Transformers, by fine-tuning an underlying model (BERT –
  • Automatic Classifier using Deep Learning with Recurrent Neural Networks (RNN).
  • Automatic Classifier XT (Automatic Classifier – extended) using Deep Learning with Transformers, by fine-tuning an underlying model (BERT).

The reason why you would use either the RNN Classifier or the Transformers-based (XT) Classifier has much to do with the amount of training data you have and how quickly you want to go to production with a satisfying result. A detailed explanation is available in this blog post: 

One of the capabilities that are added in AI Factory 2.0 is the ability to adapt Transformer-based models for Classification, by fine-tuning the underlying pre-trained BERT model based on the specific training data.

Read the AI Factory version 2.0 release announcement.

Understanding Model Adaptability for Classifier XT

The Automatic Classifier XT trainer has a new “Adaptable” switch, set by default. If the “Adaptable” switch is set the weights of the fine-tuned BERT model will be updated during the training process. This means that the model will fine-tune the pre-trained weights based on your specific dataset. If the “Adaptable” switch is unset then the weights of BERT will remain fixed and won’t be updated during training. This effectively “freezes” the weights of the base BERT model. When you freeze the weights of a pre-trained model like BERT, you’re leveraging the knowledge it already has without modifying it. Instead, any additional layers you add on top (the classification head) will be the only part of the model that gets trained.

How does this work?

Imagine our AI model as a seasoned chef who has mastered many recipes from a renowned culinary school. Your training data is like a unique set of ingredients you’re providing to this chef. When “Adaptable” is set to “ON”, you’re allowing the chef to tweak and refine the entire recipe based on your specific ingredients. This ensures the dish is tailor-made for your taste. It’s recommended to turn this ON when you believe your ingredients (or data) have some unique flavors that the chef hasn’t encountered in culinary school. However, when “Adaptable” is set to “OFF”, the chef uses the foundational recipes he learned but adjusts only the final garnishing based on your ingredients. In AI terms, only the last layer of the model is trained. This means the core knowledge remains unchanged, but the model still learns how to present the results in a way that best suits your data. It’s like trusting the chef’s original training for the main course but allowing some flexibility in the final presentation. This approach is beneficial when you want to leverage the model’s foundational knowledge but still need it to understand the specific nuances of your data. In essence, with “Adaptable” ON, the model can adjust its entire knowledge for your data. With it OFF, the model maintains its core understanding but fine-tunes the final output based on your specific needs.

Fine-tunning for XLU models

Another feature added to AI Factory 2.0 is the Machine Learning fine-tuning for the cross-language (XLU) models used by Classifier XT and Entities Extractor XT. XLU models allow you to train your task with a corpus in one language and have the task ready to cope with a plethora of other languages without any other intervention. You could have your engine trained using an English training corpus and the engine will be capable of understanding texts in Arabic or Chinese, for example.

The only thing that you need to do to fine-tune your own Classifier or Entity Extractor for cross-language understanding is to select “Multi-Language” instead of a specific language when you create your Classifier XT or Entity Extractor XT project.

One response