COCO dataset (2014)

330K images (>200K labeled)
1.5 million object instances
80 object categories
91 stuff categories
5 captions per image. A caption is a short textual description of the image.

So they have relatively few object labels, but their focus seems to be putting a bunch of objects on the same image. E.g. they have 13 cat plus pizza photos. Searching for such weird combinations is kind of fun.

Their official dataset explorer is actually good: cocodataset.org/#explore

And the objects don't just have bounding boxes, but detailed polygons.

Also, images have captions describing the relation between objects:

a black and white cat standing on a table next to a pizza.

Epic.

This dataset is kind of cool.

Original 2014 paper by Microsoft: arxiv.org/abs/1405.0312

Table of contents 151 2
- COCO subset COCO dataset 31 1
  - COCO 2017 COCO subset 31

 Ancestors (8)

Computer vision dataset
Computer vision
Machine learning
Computer
Information technology
Area of technology
Technology
 Home

 Incoming links (5)

MLperf
MLperf v2.1 ResNet
Open Images dataset
Torchvision ResNet
You Only Look Once