Concepedia

Publication | Closed Access

Knowledge Adaptation for Efficient Semantic Segmentation

224

Citations

31

References

2019

Year

TLDR

Semantic segmentation demands both high accuracy and computational efficiency, but existing fully convolutional networks rely on high‑resolution feature maps that are computationally expensive, and simply reducing resolution to speed up inference sharply degrades accuracy. To resolve this trade‑off, we propose a knowledge‑distillation framework that enhances compact fully convolutional networks with large stride. The framework aligns latent‑domain features between teacher and student using a pre‑trained autoencoder and introduces an affinity module that captures long‑range dependencies via non‑local interactions, and its effectiveness is validated on Pascal VOC, Cityscapes, and Pascal Context. The method improves a student network’s mIOU from 70.2 to 72.7 on the Cityscapes test set—a 2.5 % gain—while training a compact model that uses only 8 % of the FLOPs of a comparable model.

Abstract

Both accuracy and efficiency are of significant importance to the task of semantic segmentation. Existing deep FCNs suffer from heavy computations due to a series of high-resolution feature maps for preserving the detailed knowledge in dense estimation. Although reducing the feature map resolution (i.e., applying a large overall stride) via subsampling operations (e.g., polling and convolution striding) can instantly increase the efficiency, it dramatically decreases the estimation accuracy. To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride. To handle the inconsistency between the features of the student and teacher network, we optimize the feature similarity in a transferred latent domain formulated by utilizing a pre-trained autoencoder. Moreover, an affinity distillation module is proposed to capture the long-range dependency by calculating the non local interactions across the whole image. To validate the effectiveness of our proposed method, extensive experiments have been conducted on three popular benchmarks: Pascal VOC, Cityscapes and Pascal Context. Built upon a highly competitive baseline, our proposed method can improve the performance of a student network by 2.5% (mIOU boosts from 70.2 to 72.7 on the cityscapes test set) and can train a better compact model with only 8% float operations (FLOPS) of a model that achieves comparable performances.

References

YearCitations

Page 1