Object detection¶

We should be able to detect multiple objects belonging to multiple classes in same image

Generate region proposals (Image patches that are likely to contain objects)

Perform classification over proposals

Faster-RCNN¶

Here is the FasterRCNN object detector (0.2s per image for 38 classes)

There are 4 losses

1) object or not object loss in Region proposal network (although a binary loss, in the paper we see a two class softmax loss used)

2) Bounding box regression loss in RPN

3) Classification loss

4) Bounding box adjustment regression loss

YOLO / SSD¶

You only look once / Single Shot descriptor

Here the region proposal layer outputs the class score instead of binary score. Therefore, we just have a single serial neural network (may be with skip connections)

The anchor generation and GT matching are similar to Faster RCNN.

Its much faster than Faster RCNN but less accurate. It is suitable for real time applications

Important paper : Speed and accuracy tradeoffs for modern object detectors ¶

Helps you decide which architecture to choose for your application

Using region proposal features for image captioning¶

Dense captioning architecture

3D object detection¶

TODO