By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet .
The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping (1×1 convolution) stage and token mixing stage simultaneously.
We show that the frequency self-attention modules have the lowest complexity, and require 87.29% ∼ 90.04% less memory, 96.13% ∼ 98.07% less FLOPs, and 97.56% ∼ 98.18% in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result (83.0% mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.
Short bio of the speaker:
Fengyu Zhang received the B.S. degree in communication engineering from the Guangxi University, China, in 2013, and the M.S. degree in traffic information engineering from the Central South University, Changsha, China, in 2016. She was a software designer in. CRRC Ltd (a company engaged in the development and manufacture of rail transit vehicles) from 2016 to 2019. Currently she was a PHD student in the Department of traffic information engineering, Central South University, and she is also a visiting PHD student with Computer Science Department, Chalmers University. Her research interests include the signal processing, machine learning, especially computer vision and image processing.
28 November, 2022, 14:00
28 November, 2022, 15:00