One of the key differentiators of AI Factory‘s Text Analytics capabilities is the ability to process content at tremendous speeds even on commodity hardware. With the AI Factory version 2.0 release his capability was enhanced across the entire machine learning to inference pipeline, with hardware optimization taking place for all AI engines generated in the no-code dashboard.
Read the AI Factory version 2.0 release announcement.
In a world dominated by huge AI models that require extensive hardware and computational resources to run the inference, with the associated generated CO2 and the natural resources used, we believe it is important to produce Artificial Intelligence engines that will utilize hardware resources in an optimal way, without an impact on quality. A great deal of the engineering effort put in the AI Factory v 2.0 release was in the AI engines that the no-code Machine Learning tool produce, which are not “just” models, but productive software ready to run with a command line, with an encrypted code, a standardized REST API, SSL and API key support, a web dashboard and, of course, low hardware requirements. In our latest benchmarks we managed to obtain throughputs in the order of Milions of words per minute on CPU-only machines, which speaks to the tremendous data processing capabilities our AI software offers.
George Bara, Chief Strategist
Our platform utilizes two main types of algorithms for Machine Learning: Recurrent Neural Networks (RNN), mostly for classification tasks, and Transformers for more complex (or multilanguage) tasks, such as Named Entity Recognition. Both approaches yield advantages and disadvantages, but one key difference is the required hardware usage. While RNNs tend to run faster on CPU-only machines, but require rather large training datasets, Transformers are slower on CPUs and run best on GPU environments, but require less training data. This difference is notable when benchmarking the AI Factory engines:
- CPU-only machines are utilized to benchmark RNN engines, such as Sentiment Analysis. The throughputs obtained for this type of engine reached 567 requests per second, equivalent to 1.45 Million words per minute, on a commodity CPU-only environment with 16 CPU cores and 32 GB of RAM that costs less than $4000.
- GPU-enabled machines are utilized to benchmark Transformer engines, such as Named Entity Recognition. The throughputs obtained for this type of engine reached 1607 responses per minute, equivalent to 293,000 words per minute (WPM), on the same machine, but this time with a NVIDIA GeForce RTX 3080 VRAM 16GB GPU. For the cross-language NER engine, which can process more than 40 languages, the throughput is 36,204 words per minute (WPM).