Summary statistics of dataset records by taxonomic rank.
Attributes Imbalance Ratio (IR) BIOSCAN-5M Categories BIOSCAN-5M Labelled BIOSCAN-5M Labelled (%) BIOSCAN-5M Categories BIOSCAN-1M Labelled BIOSCAN-1M Labelled (%) BIOSCAN-1M
phylum 1 1 5,150,850 100.0 1 1,128,313 100.0
class 719,831 10 5,146,837 99.9 1 1,128,313 100.0
order 3,675,317 55 5,134,987 99.7 16 1,128,313 100.0
family 938,928 934 4,932,774 95.8 491 1,112,968 98.6
subfamily 323,146 1,542 1,472,548 28.6 760 265,492 23.5
genus 200,268 7,605 1,226,765 23.8 3,441 254,096 22.5
species 7,694 22,622 473,094 9.2 8,355 84,397 7.5
dna_bin 35,458 324,411 5,137,441 99.7 91,918 1,128,313 100.0
dna_barcode 3,743 2,486,492 5,150,850 100.0 552,629 1,128,313 100.0