An overview of the number of unique categories and the number/proportion of expert-labeled samples within the BIOSCAN-1M Insect dataset at each taxonomic rank. The bottom row provides Barcode Index Number (BIN) information, a genetic alternative to taxonomic labels (species proxy). All samples have an associated BIN, with roughly 10× more unique BINs than species labels.
    
        
             | 
            Phylum | 
            Class | 
            Order | 
            Family | 
            Subfamily | 
            Tribe | 
            Genus | 
            Species | 
            BIN | 
        
    
    
        
            | Categories | 
            1 | 
            1 | 
            16 | 
            491 | 
            760 | 
            535 | 
            3,441 | 
            8,355 | 
            90,918 | 
        
        
            | Labeled Samples | 
            1,128,313 | 
            1,128,313 | 
            1,128,313 | 
            1,112,968 | 
            265,492 | 
            60,477 | 
            254,096 | 
            84,397 | 
            1,128,313 | 
        
        
            | Labeled (%) | 
            100.0 | 
            100.0 | 
            100.0 | 
            98.6 | 
            23.5 | 
            5.4 | 
            22.5 | 
            7.5 | 
            100.0 |