Assignment of Source Details into categories A 2-level source classification scheme (Table X) was set up to cover a wide range of potential hosts or environments where the bacteria are mostly isolated from. In order to automatically assign genomic metadata from GenBank into Source Niche/Type categories, a ‘Source Details’ field was set up in EnteroBase to summarize all host-related biosample attributes for genomes in GenBank. A subset of 3,546 distinct “Source Details” entries were manually assigned into Source Niches/Types at 2015 and used as ground truth to train a Native Bayesian classifier implemented in Python NLTK library (Loper and Bird 2002). During the training process, these manually curated data were randomly separated into a training and a test dataset with 2,000 and 1,546 entries, respectively. The source classifier was trained using the training dataset and evaluated using the test dataset to achieve an accuracy of ~80%. The source classifier was then trained again using all 3,546 entries and used to assign all GenBank entries into categories. This classifier worked fine initially and encountered high frequencies of failed assignments in practice after 2 years. Thus its performance was re-evaluated at 2018 using an independent set of 3,000 manually curated entries. The accuracy of the assignments dropped down to 60% this time. Further evaluation attributed the reduced accuracy to the huge amount of new-coming words that are not recognized by the source classifier. Table X. The Source Niche/Type classification scheme in EnteroBase Source Niche Source Type Examples of Source Details Aquatic Fish; Marine Mammal; Shellfish Tuna, lobster Companion Animal Canine; Feline Cat, dog Environment Air; Plant; Soil/Dust; Water River, tree, soil Feed Animal Feed; Meat Dog treat, fishmeal Food Composite Food; Dairy; Fish; Meat; Shellfish Milk, salami, ready-to-eat food Human Human Patient, biopsy Laboratory Laboratory Reference strain, serial passage Livestock Bovine; Camelid; Equine; Ovine; Swine Horse, calf Poultry Avian Turkey, chicken Wild Animal Amphibian; Avian; Bat; Bovine; Camelid; Canine; Deer; Equine; Feline; Invertebrates; Marsupial; Other Mammal; Ovine; Primate; Reptile; Rodent; Swine Flamingo, frog, python, Spider ND ND