About schemes within EnteroBase (cg/wg/r/MLST)

Here we explore some of the broader concepts behind EnteroBase and how these can be used to get the most out of EnteroBase. One of the unique selling points of EnteroBase is that it provides a global overview of an entire genus. Allowing you to see where you strain sits within the entire population. To effectively deal with such large datasets, however, require some degree of abstraction which we will introduce here.

  • All MLST-like typing methods in EnteroBase are derived from a genome assembly of sequenced reads. For an explanation of this method, see here
  • For a general description of the in silico typing method, see here

For details about the application of these methods for each species:

Why use MLST in the genomic era?

  • Still true: Reflect real bacterial population in Salmonella and many other bacteria
  • Mid level resolution: Long term tracking of a pathogen & Somewhat comparable with serotyping
  • Easy to remember: ST313 - Salmonella Typhimurium & ST131 - ExPEC E. coli
  • Scalable: 7 integers per strain versus 5MB A,C,G,Ts, ~4000 integers for cgMLST
  • Well established databases

Thinking about classifying a bacterial population

Typing methods based around antigenicity, pathotyping and other typing methods, some of which are the de jure standard in many reference labs, do not always correlate with the relativity of individual strains. Consider the presence of the Shiga toxin genes in Enterohaemorhaggic E. coli, where Shiga toxin positive E. coli is found in all phylogroups across the population. The designation of Enterohaemorhaggic is ultimately one of clinical manifestation rather than suggesting any shared ancestry between such strains. Likewise Salmonella enterica serovar Newport is made of multiple discrete lineages and to treat it as uniform is misleading.

Discrimination (low to high)
eBURST Group (eBG)
Sequence type (MLST)
Ribosomal MLST eBG
Ribosomal MLST ST
Core genome MLST
SNPs

In analyses attempting to place strains within a population, it makes sense to use a neutral set of markers from across the genome. This is the motivation behind MLST. However, classical MLST is limited in its discriminant power, as it only focuses on a handful of genes. The solution in this case is to increase the number of genes, or use SNPs, as the informative sites.

It should be noted that STs are arbitrary constructs, and natural populations can each encompass multiple, related ST variants. Therefore, 7-gene STs are grouped into ST Complexes in Escherichia/Shigella by an eBurst approach and into their equivalent eBurst groups (eBGs) in S. enterica. EnteroBase has also implemented similar population groups (reBGs) for rMLST in Salmonella, which are largely consistent with eBGs or their sub-populations.

Also within EnteroBase we extend each species from classical MLST, rMLST, to core genome MLST.

MLST Classic Ribosomal MLST Core Genome MLST
7-8 Loci 53 Loci ~ 1500-3000 for Salmonella
Conserved Housekeeping genes Ribosomal proteins Any conserved coding sequence
Highly conserved; Low resolution Highly conserved; Medium resolution Variable; High resolution
Different scheme for each Species/genus Single scheme across tree of life Different scheme for each Species/genus
https://bitbucket.org/repo/Xyayxn/images/113611161-sal_mst.png

Figure 1: Minimal spanning tree (MSTree) of MLST data on 4257 isolates of `S. enterica` subspecies enterica. From Achtman et al. (2012) PLoS Pathog 8(6): e1002776.

Searching deeper within clonal complexes

EnteroBase currently supports a number of population clustering approaches:

  • MLST
  • eBG
  • rST
  • rEBG
  • cgMLST

These methods can be searched through the Experimental Data tab on the search. The example below shows how to search rMLST eBG ‘4.1’ which corresponds to a sub-lineage within Salmonella serovar Enteritidis.

https://bitbucket.org/repo/Xyayxn/images/2158722747-ent_clust.png

The values can be browsed through the experimental data for each genotyping methods. From the top right hand dropbox, you can select available genotyping schemes.

Serovar prediction (in Salmonella) is based on the consensus of metadata serovar designation to the strain’s eBG (either rMLST or MLST). Click the eye to see an extended breakdown.

7 Gene MLST shows all allele profile in the right hand pane, if you scroll right. Larger genotyping schemes show the allele profile through the eye on the left.

https://bitbucket.org/repo/Xyayxn/images/657908807-ent_clust2.png