The DNA barcode statistics for different taxonomic ranks from the BIOSCAN-5M dataset are presented, including the mean, median, standard deviation and average Shannon Diversity Index (SDI) of identical DNA barcodes across subgroups (categories) within taxonomic ranks. Additionally, the mean and standard deviation of pairwise DNA barcode sequence distances, aggregated across subgroups at each taxonomic rank, are also reported.
Unique Barcodes Barcodes Distance
Attributes Categories Count Mean Median Std. Dev. Avg SDI Mean Std. Dev.
phylum 1 2,486,492 13.71 158 42
class 10 2,482,891 248,289 177 725,237 5.93 166 103
order 55 2,474,855 44,997 57 225,098 4.89 128 53
family 934 2,321,301 2,485 46 19,701 3.76 90 46
subfamily 1,542 657,639 426 17 3,726 2.97 78 51
genus 7,605 531,109 70 5 1,061 1.82 50 39
species 22,622 202,260 9 2 37 1.01 17 18