We should be able to detect multiple objects belonging to multiple classes in same image
Generate region proposals (Image patches that are likely to contain objects)
Perform classification over proposals
Here is the FasterRCNN object detector (0.2s per image for 38 classes)
There are 4 losses
1) object or not object loss in Region proposal network (although a binary loss, in the paper we see a two class softmax loss used)
2) Bounding box regression loss in RPN
3) Classification loss
4) Bounding box adjustment regression loss
You only look once / Single Shot descriptor
Here the region proposal layer outputs the class score instead of binary score. Therefore, we just have a single serial neural network (may be with skip connections)
The anchor generation and GT matching are similar to Faster RCNN.
Its much faster than Faster RCNN but less accurate. It is suitable for real time applications
Helps you decide which architecture to choose for your application
Dense captioning architecture
TODO