EnteroBase is strain based. Each strain is associated with metadata and genomic assemblies, as well as with deduced genotyping data. All assemblies are performed de novo from Illumina/PacBio reads using a standardised, versioned pipeline. Unless explicitly chosen, only assemblies that match pre-defined criteria described in QA evaluation are displayed, and where multiple assemblies are associated with a strain, only the best assembly according to QA evaluation is displayed.
Genotyping data is deduced exclusively from assemblies. MLST data is called by BLASTN and USEARCH_ against a dataset of allelic sequences that differ from each other by at least 2.5%. Other genotyping methods are under development. Genotyping data is summarised in the Experimental Data pane. The full data including assemblies can be downloaded freely (but see Fair Usage)
All MLST-like typing methods in EnteroBase are derived from a genome assembly of sequenced reads. An explanation of this method is found in EnteroBase QAssembly.
For a general description of the in silico typing method, see Enterobase nomenclature. For details about the application of these methods for each species:
All metadata, assemblies and genotyping data can be freely downloaded for academic purposes. In order to allow users who upload unpublished data sufficient time to perform their own analyses, we request that no analyses of user data be published without their explicit permission prior to the release date. Both metadata and genomic data will be clearly marked if it is downloaded prior to the release date. We would also consider it fair usage that users who wish to analyse very large amounts of the data stored in EnteroBase also contribute software tools to EnteroBase that facilitate the presentation and analysis of their results. Downloading and analyses of data by commercial enterprises can only be performed after explicit permission by the administrators, which may involve legal agreements regarding material transfer.
EnteroBase users are encouraged to upload their own reads to the website, which will be assembled and genotyped like existing public data. Submitters should note that raw data (sequence reads) will never be made public through the website to other users. The genome assembly will only be accessible to the data submitter and their buddies for 6 months after uploads. Assembly data will then be made public, longer release dates can be negotiated by contacting us on email@example.com. Genotyping results i.e. MLST, ribosomal MLST, core genome MLST, in silico serotyping, will be made public as soon as the uploaded data has been processed. User passwords on the website are encrypted and no one, including administrators, can easily access them. However, we would advise you NOT to use the same password you would use for important accounts, such as internet banking.
If you use data/metadata from the website, or the analysis based on these data, please cite EnteroBase as:
Alikhan NF, Zhou Z,Sergeant MJ, Achtman M (2018) “A genomic overview of the population structure of Salmonella.” PLoS Genet 14 (4): e1007261. https://doi.org/10.1371/journal.pgen.1007261
If you use GrapeTree please cite the preprint:
Zhou Z, Alikhan NF, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, Carrico JA, Achtman M (2017) “GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens”, bioRxiv 216788; https://doi.org/10.1101/216788
3rd Party acknowledgements¶
If you use data generated by 3rd party tools in EnteroBase, please cite both EnteroBase and the paper describing the specific tool.