An overview of the number of unique categories and the number/proportion of expert-labeled samples within the BIOSCAN-1M Insect dataset at each taxonomic rank. The bottom row provides Barcode Index Number (BIN) information, a genetic alternative to taxonomic labels (species proxy). All samples have an associated BIN, with roughly 10× more unique BINs than species labels.
Phylum Class Order Family Subfamily Tribe Genus Species BIN
Categories 1 1 16 491 760 535 3,441 8,355 90,918
Labeled Samples 1,128,313 1,128,313 1,128,313 1,112,968 265,492 60,477 254,096 84,397 1,128,313
Labeled (%) 100.0 100.0 100.0 98.6 23.5 5.4 22.5 7.5 100.0