Perceptual Generative Adversarial Networks for Small Object Detection
2017-07-11 19:47:46 CVPR 2017
This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the following figures, small object and large objects usually shown different representations from the feature level.
Thus, it is possbile to use Percetual GAN to super-resolution of feature maps of small objects to obtain better detection performance.
It consists of two subnetworks, i.e., a generator network and a perceptual discriminator network. Specifically, the generator is a deep residual based feature generative model which transforms the original poor features of small objects to highly discriminative ones by introducing fine-grained details from lower-level layers, achieving “super-resolution” on the intermediate representations.
Different from normal GAN, this network also introduce a new perceptual loss tailored from the detection purpose. That is to say, the discriminator not only need to deal with the adversarial loss, but also need to justify the detection accuray benefiting from the generated super-resolved features with a perceptual loss.
The proposed contributions:
(1) We are the first to successfully apply GAN-alike models to solve the challenging small-scale object detection problems.
(2) We introduce a new conditional generator model that learns the additive residual representation between large and small objects, instead of generating the complete representations as before.
(3) We introduce a new perceptual discriminator that provides more comprehensive supervision beneficial for detections, instead of barely differentiating fake and real.
(4) Successful applications on traffic sign detection and pedestrian detection have been achieved with the state-of-the-art performance.
Figure 2. Training procedure of object detection network based on the Perceptual GAN.
As shown in Figure 2, the generator network aims to generate super-resoved representation for the small object.
The discriminator includes two branches, i.e.
1. the adversarial branch for differentiating between the generated superresolved representation.
2. the perception branch for justifying the detection accurcy benefiting from the generation representation.
==>> Dicriminative Network Architecture:
The D network need to justify the dection accuracy benefiting from the generated super-resovled feature.
Given the adversarial loss $L_{dis_a}$ and the perceptual loss $L_{dis_p}$ , a final loss function Ldis can be produced as weighted sum of both individual loss components. Given weighting parameters w1 and w2, we define Ldis = w1 × Ldis_a + w2 × Ldis_p to encourage the generator network to generate super-resolved representation with high detection accuracy. Here we set both w1 and w2 to be one.