Compared with Faster R-CNN
, our method and model are more suitable for small target detection.
The Mask R-CNN
 network builds upon the Faster R-CNN
 network with the addition of segmentation, whose purpose is to predict the object mask.
In the CCNN structure, the first-level network uses the CNN image classification model, and the second-level network uses the Faster R-CNN
object detection model.
The latest representative work is the Mask R-CNN
 which improved on Faster R-CNN
 proposed R-CNN
, where the regions are generated by some oversegmentation algorithms such as the selective search  and the CNN is fine-tuned with these region proposals.
The rest of the paper is organized as follows: Section 2 briefly reviews some related work about vehicle detection with CNN from UAV images, followed by the methodological details of the Faster R-CNN
 in Section 3.
For person detection, we use the Faster R-CNN
 convolutional neural network based on an object detector and an ASSL framework that is implemented on the popular Caffe deep learning library .
3D RoIAlign inherits the advantages of Mask R-CNN
, it predicts the 3D mask of instances for each class independently, this independency guarantees robust segmentation for 3D amodal without artifacts on overlapping instances.
Sun, "Faster R-CNN
: Towards real-time object detection with region proposal networks," Advances in neural information processing systems (2015), pp.
Caption: Figure 6: Traffic light detection examples of top-ranked RGB based Faster R-CNN
model with Inception-Resnet- v2.
(5.) Ren S., He K., Girshick R., and Sun J., "Faster R-CNN
: Towards realtime object detection with region proposal networks," in Advances in Neural Information Processing Systems (NIPS), 2015.
Test time per image 50 s 2 s 0.2 s Speedup 1x 25x 250x mAP (VOC2007) 66.0 66.9 73.2