BIOSCAN-5M taxonomic and BIN categories distribution. For each attribute, we show the value which occurs most often in the dataset and the least populated value (in the event of a tie, we show an exemplar selected at random).
Attributes Categories Most populated Least populated Mean Median Std. Dev.
Name Size Name Size
phylum 1 Arthropoda 5,150,850 Arthropoda 5,150,850 5,150,850.0 5,150,850.0 0.0
class 10 Insecta 5,038,818 Ostracoda 7 514,683.7 369.0 1,508,192.8
order 55 Diptera 3,675,317 Cumacea 1 93,363.4 172.0 495,969.5
family 934 Cecidomyiidae 938,928 Pyrgodesmidae 1 5,281.3 63.5 45,321.1
subfamily 1,542 Metopininae 323,146 Bombyliinae 1 953.7 23.0 9,092.8
genus 7,605 Megaselia 200,268 chalMalaise9590 1 161.3 6.0 2,492.2
species 22,622 Psychoda sp. 11GMK 7,694 Microcephalops sp. China3 1 20.9 2.0 139.5
dna_bin 324,411 BOLD:AEO1530 35,458 BOLD:ADT1070 1 15.8 2.0 146.4