Ciro Santilli OurBigBook.com $£ Sponsor €¥ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
Subset of ImageNet. About 167.62 GB in size according to www.kaggle.com/competitions/imagenet-object-localization-challenge/data.
Contains 1,281,167 images and exactly 1k categories which is why this dataset is also known as ImageNet1k: datascience.stackexchange.com/questions/47458/what-is-the-difference-between-imagenet-and-imagenet1k-how-to-download-it
www.kaggle.com/competitions/imagenet-object-localization-challenge/overview clarifies a bit further how the categories are inter-related according to WordNet relationships:
The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other.
image-net.org/challenges/LSVRC/2012/browse-synsets.php lists all 1k labels with their WordNet IDs.
n02119789: kit fox, Vulpes macrotis
n02100735: English setter
n02096294: Australian terrier
There is a bug on that page however towards the middle:
n03255030: dumbbell
href="ht:
n02102040: English springer, English springer spaniel
and there is one missing label if we ignore that dummy href= line. A thinkg of beauty!
Also the lines are not sorted by synset, if we do then the first three lines are:
n01440764: tench, Tinca tinca
n01443537: goldfish, Carassius auratus
n01484850: great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57 has lines of type:
n02119789 1 kit_fox
n02100735 2 English_setter
n02110185 3 Siberian_husky
therefore numbered on the exact same order as image-net.org/challenges/LSVRC/2012/browse-synsets.php
gist.github.com/yrevar/942d3a0ac09ec9e5eb3a lists all 1k labels as a plaintext file with their benchmark IDs.
{0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
therefore numbered on sorted order of image-net.org/challenges/LSVRC/2012/browse-synsets.php
The official line numbering in-benchmark-data can be seen at LOC_synset_mapping.txt, e.g. www.kaggle.com/competitions/imagenet-object-localization-challenge/data?select=LOC_synset_mapping.txt
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
huggingface.co/datasets/imagenet-1k also has some useful metrics on the split:
  • train: 1,281,167 images, 145.7 GB zipped
  • validation: 50,000 images, 6.67 GB zipped
  • test: 100,000 images, 13.5 GB zipped

Ancestors

  1. ImageNet subset
  2. ImageNet
  3. Computer vision dataset
  4. Computer vision
  5. Machine learning
  6. Computer
  7. Information technology
  8. Area of technology
  9. Technology
  10. Ciro Santilli's Homepage

Synonyms