Exploring deeper lineages with MLST, rMLST and cgMLST

Here we explore some of the broader concepts behind EnteroBase and how these can be used to get the most out of EnteroBase. One of the unique selling points of EnteroBase is that it provides a global overview of an entire genus. Allowing you to see where you strain sits within the entire population. To effectively deal with such large datasets, however, require some degree of abstraction which we will introduce here.

Thinking about classifying a bacterial population

Typing methods based around antigenicity, pathotyping and other typing methods, some of which are the de jure standard in many reference labs, do not always correlate with the relativity of individual strains. Consider the presence of the Shiga toxin genes in Enterohaemorhaggic E. coli, where Shiga toxin positive E. coli is found in all phylogroups across the population. The designation of Enterohaemorhaggic is ultimately one of clinical manifestation rather than suggesting any shared ancestry between such strains. Likewise Salmonella enterica serovar Newport is made of multiple discrete lineages and to treat it as uniform is misleading.

Discrimination (low to high)
eBURST Group (eBG)
Sequence type (MLST)
Ribosomal MLST eBG
Ribosomal MLST ST
Core genome MLST

In analyses attempting to place strains within a population, it makes sense to use a neutral set of markers from across the genome. This is the motivation behind MLST. However, classical MLST is limited in its discriminant power, as it only focuses on a handful of genes. The solution in this case is to increase the number of genes, or use SNPs, as the informative sites.

Within EnteroBase we extend each species from classical MLST, rMLST, to core genome MLST.

MLST Classic Ribosomal MLST Core Genome MLST
7-8 Loci 53 Loci ~ 1500-3000 for Salmonella
Conserved Housekeeping genes Ribosomal proteins Any conserved coding sequence
Highly conserved; Low resolution Highly conserved; Medium resolution Variable; High resolution
Different scheme for each Species/genus Single scheme across tree of life Different scheme for each Species/genus

(There are classical MLST schemes in the literature using between about 5 and 15 loci; but those implemented in EnteroBase just use 7 or 8 loci.)

https://bitbucket.org/repo/Xyayxn/images/113611161-sal_mst.png

Minimal spanning tree (MSTree) of MLST data on 4257 isolates of `S. enterica` subspecies enterica. From Achtman et al. (2012) PLoS Pathog 8(6): e1002776.

Searching deeper within clonal complexes

EnteroBase currently supports a number of population clustering approaches:

  • MLST
  • eBG
  • rST
  • rEBG
  • cgMLST

These methods can be searched through the Experimental Data tab on the search. The example below shows how to search rMLST eBG ‘4.1’ which corresponds to a sub-lineage within Salmonella serovar Enteritidis.

https://bitbucket.org/repo/Xyayxn/images/2158722747-ent_clust.png

The values can be browsed through the experimental data for each genotyping methods. From the top right hand dropbox, you can select available genotyping schemes.

Serovar prediction (in Salmonella) is based on the consensus of metadata serovar designation to the strain’s eBG (either rMLST or MLST). Click the eye to see an extended breakdown.

7 Gene MLST shows all allele profile in the right hand pane, if you scroll right. Larger genotyping schemes show the allele profile through the eye on the left.

https://bitbucket.org/repo/Xyayxn/images/657908807-ent_clust2.png