A Powerful, User-Friendly Online Resource for Analyzing and Visualizing Genomic Variation within Enteric Bacteria
EnteroBase aims to establish a world-class, one-stop, user-friendly, backwards-compatible but forward-looking genome database, Enterobase – together with a set of web-based tools, EnteroBase Backend Pipeline – to enable bacteriologists to identify, analyse, quantify and visualise genomic variation principally within the genera:
EnteroBase is strain based. Each strain is associated with metadata and genomic assemblies, as well as with deduced genotyping data. All assemblies are performed de novo from Illumina/PacBio reads using a standardised, versioned pipeline. Unless explicitly chosen, only assemblies that match pre-defined criteria described in Quality Assessment evaluation are displayed, and where multiple assemblies are associated with a strain, only the best assembly according to Quality Assessment evaluation is displayed.
Genotyping data is deduced exclusively from assemblies. MLST data is called by BLASTN and USEARCH against a dataset of allelic sequences that differ from each other by at least 2.5%. Other genotyping methods are under development. Genotyping data is summarised in the Experimental Data pane. The full data including assemblies can be downloaded freely (but see Fair Usage)
All MLST-like typing methods in EnteroBase are derived from a genome assembly of sequenced reads. An explanation of this method is found in EnteroBase QAssembly.
For a general description of the in silico typing method, see Enterobase nomenclature. For details about the application of these methods for each species:
All public metadata, assemblies and genotyping data can be freely downloaded for academic purposes. In order to allow users who upload unpublished data sufficient time to perform their own analyses, Prior to the release date, we restrict downloads of their genomes, metadata, and genotypes to the owner who uploaded the data, or to their self-declared ‘buddies’, as well as to curators and administrators. Other users can see these data in the workspace browser or use them to generate a tree. However, that tree cannot be used for publication by anybody who does not have explicit rights.
We would also consider it fair usage that users who wish to analyse very large amounts of the data stored in EnteroBase also contribute software tools to EnteroBase that facilitate the presentation and analysis of their results. Downloading and analyses of data by commercial enterprises can only be performed after explicit permission by the administrators, which may involve legal agreements regarding material transfer.
EnteroBase users are encouraged to upload their own reads to the website, which will be assembled and genotyped like existing public data. Submitters should note that raw data (sequence reads) will never be made public through the website to other users. The genome assembly will only be accessible to the data submitter and their buddies for 6 months after uploads. Assembly data will then be made public, longer release dates can be negotiated by contacting us on email@example.com. Genotyping results i.e. MLST, ribosomal MLST, core genome MLST, in silico serotyping, will be made public as soon as the uploaded data has been processed. User passwords on the website are encrypted and no one, including administrators, can easily access them. However, we would advise you NOT to use the same password you would use for important accounts, such as internet banking.
- If you use data/metadata from the website, or the analysis based on these data, please cite EnteroBase as:
- Zhou Z., Alikhan NF, Mohamed K, the Agama Study Group, Achtman M (2020), “The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny and Escherichia core genomic diversity”, Genome Res. 2020. 30: 138-152, https://doi.org/10.1101/gr.251678.119
- Please cite Salmonella database as:
Achtman M, Zhou Z, Alikhan NF, et al. (2020) “Genomic diversity of Salmonella enterica -The UoWUCC 10K genomes project”, Wellcome Open Research 5, 223, https://doi.org/10.12688/wellcomeopenres.16291.1
Alikhan NF, Zhou Z, Sergeant MJ, Achtman M (2018) “A genomic overview of the population structure of Salmonella”, PLoS Genet 14 (4): e1007261, https://doi.org/10.1371/journal.pgen.1007261
- Please cite Clostridioides database as:
- Frentrup M, Zhou Z, Steglich M, et al. (2020) “A publicly accessible database for Clostridioides difficile genome sequences supports tracing of transmission chains and epidemics”, Microb Genom. 6(8). doi: http://dx.doi.org/10.1099/mgen.0.000410
- If you use GrapeTree please cite the following paper:
- Zhou Z, Alikhan NF, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, Carrico JA, Achtman M (2018) “GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens”, Genome Research. 28(9): 1395–1404, https://doi.org/10.1101/gr.232397.117
3rd Party acknowledgements¶
If you use data generated by 3rd party tools in EnteroBase, please cite both EnteroBase and the paper describing the specific tool.
- rMLST is Copyright 2010-2016, University of Oxford. rMLST is described in: Jolley et al. 2012 Microbiology 158:1005-15.
- Serovar predictions (SISTR) have been calculated using the pipeline developed by the SISTR team and is described in Yoshida et al. 2016 PLoS ONE 11(1): e0147101 and evaluated in Robertson et al. 2018 M Gen 4(2):e000151
- SeqSero2 is described in Zhang et al. 2019 Appl Environ Microbiol 85:e01746-19
- ClermonTyping is described in Beghain et al. 2018 M Gen 4(7): e000192
- EzClermont is described in: Waters et al. 2020 DOI 10.1099/acmi.0.000143
- FimTyper is described in: Roer et al. 2017 J Clin Microbiol 55(8):2538-2543