This AI Paper Proposes GhostNetV2 to Enhance Cheap Operation with Long-Range Attention

Deep neural network design is crucial in computer vision for many applications, such as image and object recognition and video analysis. AlexNet, GoogleNet, ResNet, and EfficientNet are only some landmarks in network architecture developed over the past decade. These networks have greatly improved the efficiency of many visual tasks. 

The performance of a model is important, but efficiency, especially the actual inference time, is more crucial when deploying neural networks on edge devices like smartphones and wearables. Matrix multiplications take up the majority of both the computing cost and the parameters. 

Past studies suggest that developing a lightweight model is one interesting strategy for lowering inference latency. On the contrary, the speed gains that can be made by using convolution-based lightweight models are constrained by their inability to describe long-range dependency. 

Meet Hailo-8™: An AI Processor That Uses Computer Vision For Multi-Camera Multi-Person Re-Identification (Sponsored)

In computer vision, models inspired by transformers have recently been presented, with the self-attention module capable of absorbing global data. An average self-attention module has a computational complexity that grows quadratically with the form size of the feature, making it impractical to use in real-world applications. The process of determining the attention map involves many feature-splitting and reshaping operations. Theoretically, these processes are quite simple, but they use more memory and have a higher delay in practice. Therefore, using self-attention as a placeholder in lightweight models is not mobile-friendly.

To address these issues, a new study by Huawei, Peking University, and the University of Sydney proposes GhostNetV2, a new attention mechanism (named DFC attention) to capture long-range spatial information while maintaining the efficiency of lightweight convolutional neural networks. 

The researchers created attention maps using just fully connected (FC) layers. To aggregate pixels in a 2D feature map of CNN, an FC layer is dissected into horizontal FC and vertical FC. When stacked, the two FC layers’ pixels cover a large area in both directions, creating a worldwide receptive field. Additionally, the team starts by building upon the state-of-the-art GhostNet and improving its intermediate features by paying special attention to the representation bottleneck using DFC. This resulted in GhostNetV2, a new lightweight vision infrastructure. It provides a better trade-off between accuracy and inference speed than previous systems.

To validate its superiority, the team tested GhostNetV2 on various benchmark datasets (e.g., ImageNet, MS COCO). Using the massive ImageNet dataset, they test various approaches to the image categorization challenge. When compared to other lightweight models like GhostNet, MobileNetV2, MobileNetV3, and ShuffleNet, GhostNetV2 achieves far higher performance at a reduced computational cost.

The team also employs GhostNetV2 as a foundation and incorporates it into YOLOV3, a lightweight object detection technique, to verify its generalizability. They evaluate the performance of various models on the MS COCO dataset, each having a unique backbone. To gain a deeper comprehension of GhostNetV2, they finally do a series of comprehensive ablation experiments. The results show that GhostNetV2 outperforms GhostNet V1 at various input resolutions. For instance, GhostNetV2 obtains 22.3% mAP, which is a suppression of GhostNet V1 by 0.5 mAP, while using the same computational cost (i.e., 340M FLOPs with 320320 input resolution). From these results, the team asserts that the proposed DFC attention can successfully give a large receptive field to the Ghost module and then construct a more powerful and efficient block, which is essential for downstream tasks.


Check out the Paper, Project, and Github. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.


Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.


Credit: Source link

Comments are closed.