Latest AI Research From Intel Explains an Alternative Approach to Train Deep Learning Models for Fast-Paced Real World Use Cases, Across a Variety of Industries

On Dec 7, 2022

Object detection means all the techniques and means for detecting, identifying, and classifying objects in an image. Recently, the field of artificial intelligence has seen many advances thanks to deep learning and image processing. It is now possible to recognize images or even find objects inside an image. With deep learning, object detection has become very popular with several families of models (R-CNN, YOLO, etc.). However, most of the existing methods in the literature adapt to the training database and fail to generalize when faced with images belonging to different domains.

Although most architectures are optimized for well-known benchmarks, significant results have been achieved using CNNs for tasks particular to a certain domain. However, these domain-specific solutions are often well-tuned for a specific target dataset, starting with carefully chosen architecture and training techniques. This method of training models has the drawback of unnecessarily adapting the approaches to a particular dataset. To address this issue, a research team from Intel offers a different strategy that also serves as the foundation of the Intel® Geti™ platform: a dataset-agnostic template for object detection training made up of carefully selected and pre-trained models and a reliable training pipeline for additional training.

The authors experimented with architectures in three categories: lightweight, extremely accurate, and medium, to develop a scope of the models used for the various object detection datasets regardless of complexity and object size. Pretrained weights are employed to reach model convergence quickly and begin with high accuracy. In addition, a data augmentation operation is performed to augment images with a random crop, horizontal flip, and brightness and color distortions. Multiscale training was applied for medium and accurate models to make them more robust. Additionally, to strike a balance between accuracy and complexity, the authors empirically selected particular resolutions for each model after conducting several trials. Early stopping and the adaptive ReduceOnPlateau scheduler are also used to end training if a few epochs of training do not further improve the outcome. It can be difficult to choose a suitable “patience” parameter for Early Stopping and ReduceOnPlateau in the case of dataset-agnostic training because the number of iterations in an epoch varies significantly from dataset to dataset, depending on its length. The authors proposed an iteration patience parameter to address this issue. This parameter functions similarly to the epoch patience parameter while guaranteeing that a predetermined amount of iterations were conducted during training on particular epochs. Eleven public datasets with various domains, numbers of images, classes, object sizes, the overall difficulty, and horizontal/vertical alignment are utilized for training.

Meet Hailo-8™: An AI Processor That Uses Computer Vision For Multi-Camera Multi-Person Re-Identification (Sponsored)

The strategy followed to train all the models is described below:

• begin with the weights that have been trained on the COCO dataset;

• augment images with crop, flip, and photo distortions;

• employ ReduceOnPlateau learning rate scheduler with iteration patience;

• employ Early Stopping to avoid overfitting on large datasets and iteration patience to avoid underfitting on small datasets.

An ablation experiment was conducted by deleting each training trick from the pipeline to determine the effect it had on the accuracy at the end. According to these tests, each of these tricks increased the target metric’s accuracy by about 1 AP (average precision).

Across this publication, an Intel research team presents a different method for training deep neural network models for dynamic real-world use cases in various industries. They specifically looked at ATSS and FCOS as medium model architectures, VFNet, Cascade-RCNN, and Faster-RCNN as accurate models, as well as SSD and YOLOX as fast model architectures for inference. They discovered techniques and methods for partial optimization along the way, which enabled them to improve the average AP scores across the dataset corpus. Finally, this study produced three dataset-independent object detection training templates (one for each of the three performance-accuracy regimes), which offer a solid foundation on a wide range of datasets and can be implemented on CPU using the OpenVINO™ toolbox.

Check out the Paper. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.

Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.

Credit: Source link