About schemes within EnteroBase (cg/wg/r/MLST)

Here we explore some of the broader concepts behind EnteroBase and how these can be used to get the most out of EnteroBase. One of the unique selling points of EnteroBase is that it provides a global overview of an entire genus, allowing you to see where your strain sits within the entire population. To effectively deal with such large datasets, however, require some degree of abstraction which we will introduce here.

All MLST-like typing methods in EnteroBase are derived from a genome assembly of sequenced reads. For an explanation of this method, see EnteroBase QAssembly

For a general description of the in silico typing method, see Enterobase nomenclature

For details about the application of these methods for each species:

Why use MLST in the genomic era?

  • Still true: Reflect real bacterial population in Salmonella and many other bacteria
  • Mid level resolution: Long term tracking of a pathogen & Somewhat comparable with serotyping
  • Easy to remember: ST313 - Salmonella Typhimurium & ST131 - ExPEC E. coli
  • Scalable: 7 integers per strain versus 5MB ACGTs, ~4000 integers for cgMLST
  • Well established databases

Classifying a bacterial population

Typing methods based around antigenicity, pathotyping and other typing methods, some of which are the de jure standard in many reference labs, do not always correlate with the relativity of individual strains. Consider the presence of the Shiga toxin genes in enterohaemorhagic E. coli, where Shiga toxin positive E. coli is found in all phylogroups across the population. The designation of enterohaemorhagic is ultimately one of clinical manifestation rather than suggesting any shared ancestry between such strains. Likewise Salmonella enterica serovar Newport is made of multiple discrete lineages and to treat it as uniform is misleading.

Discrimination (lowest to highest)
eBURST Group (eBG)
Sequence type (MLST)
Ribosomal MLST eBG
Ribosomal MLST ST
Core genome MLST

In analyses attempting to place strains within a population, it makes sense to use a neutral set of markers from across the genome. This is the motivation behind MLST. However, classical MLST is limited in its discriminant power, as it only focuses on a handful of genes. The solution in this case is to increase the number of genes, or use SNPs, as the informative sites.

Within EnteroBase we extend each species from classical MLST, rMLST, to core genome MLST.

MLST Classic Ribosomal MLST Core Genome MLST
7-8 Loci 53 Loci ~ 1500-3000
Conserved housekeeping genes Ribosomal proteins Any conserved coding sequence
Highly conserved; Low resolution Highly conserved; Medium resolution Variable; High resolution
Different scheme for each species/genus Single scheme across tree of life Different scheme for each species/genus

(There are classical MLST schemes in the literature using between about 5 and 15 loci; but those implemented in EnteroBase just use 7 or 8 loci.)

Searching deeper within clonal complexes

EnteroBase currently supports a number of population clustering approaches:

  • MLST
  • eBG
  • rST
  • rEBG
  • cgMLST

These methods can be searched through the Experimental Data tab on the search. The example below shows how to search rMLST eBG ‘4.1’ which corresponds to a sub-lineage within Salmonella serovar Enteritidis.


The values can be browsed through the experimental data for each genotyping methods. From the top right hand dropbox, you can select available genotyping schemes.

Serovar prediction (in Salmonella) is based on the consensus of metadata serovar designation to the strain’s eBG (either rMLST or MLST). Click the eye to see an extended breakdown.

7 Gene MLST shows all allele profile in the right hand pane, if you scroll right. Larger genotyping schemes show the allele profile through the eye on the left.