Object detection

We should be able to detect multiple objects belonging to multiple classes in same image

Generate region proposals (Image patches that are likely to contain objects)

Perform classification over proposals

Faster-RCNN

Here is the FasterRCNN object detector (0.2s per image for 38 classes)

There are 4 losses

1) object or not object loss in Region proposal network (although a binary loss, in the paper we see a two class softmax loss used)

2) Bounding box regression loss in RPN

3) Classification loss

4) Bounding box adjustment regression loss

YOLO / SSD

You only look once / Single Shot descriptor

Here the region proposal layer outputs the class score instead of binary score. Therefore, we just have a single serial neural network (may be with skip connections)

The anchor generation and GT matching are similar to Faster RCNN.

Its much faster than Faster RCNN but less accurate. It is suitable for real time applications

Important paper : Speed and accuracy tradeoffs for modern object detectors

Helps you decide which architecture to choose for your application

Using region proposal features for image captioning

Dense captioning architecture

3D object detection

TODO