HierCC (Hierarchical Clustering of CgMLST)

Hierarchical Clustering of CgMLST (HierCC) defines clusters based on cgMLST. Distances between genomes is calculated using the number of shared cgMLST alleles and genomes are linked on a single-linkage clustering criteria. These clusters are assigned stable cluster group numbers at different, fixed cgMLST allele distances. Salmonella for instance, has cut-offs such as 1, 5, 10, 20, 50, 100, etc.

Getting started and exploring HierCC

HierCC is another experimental data result, just like 7 gene MLST. Viewing these results are available through the Experimental Data dropdown. We will use an example of Salmonella Typhimurium ST 313 from Malawi to illustrate this. To search for the relevant strains do the following:

  • Use Search strains

  • Under Strain Metadata: Country should contains ‘Malawi’.

  • Under Experimental Data:

    • Experiment Type should be 7 Gene MLST
    • Data Type should be ST where ST contains ‘313’.
  • Ignore legacy data should also be checked.

  • Click Submit

    ../_images/search1.png
    ../_images/search2.png

The search results can be seen here. To specifically look at the HierCC data:

  • Under Experimental Data: Select HierCC

    ../_images/results1.png

The HierCC results for this query are shown below as an example. Each column shows the cluster groups at different thresholds. The value for each genome is the cluster group ID.`d_5` (Delta 5 or distance 5) means the clusters include all strains with links no more than 5 alleles apart.

It is important to remember that the number shown is the ID of the HierCC group and not of the allele distances and that these group IDs will not be consistent with STs of other genotyping methods like 7 gene MLST. So d_900:313 will not be the same as ST 313.

In the results example below, the d_400 is 2 which means all strains in this cluster have links no more than 400 allele apart. However, using d_50 some genomes are in HierCC d_50:14851 and some are HierCC d_50:728. This means that these genomes are in seperate clusters when they are clustered on the criteria that all strains have links no more than 50 allele apart.

../_images/results2.png

We will illustrate this with GrapeTree below.

Generating and annotating a tree based on HierCC

HierCC data can also be imported into GrapeTree figures, like any other experimental data. Let’s continue with the strains from Malawi as an example.

To generate a GrapeTree:

  • Select cgMLST under Experimental Data

  • Click the GrapeTree icon (highlighted in the red box below)

    ../_images/tree1.png
  • In the Create GrapeTree dialog:

    • Give your Tree a meaningful name under Name
    • Algorithm: RapidNJ
    • Click Submit
    ../_images/tree2.png

A new browser window will open up and it will take some seconds for the GrapeTree to be generated (Be sure to allow popups on your browser).

../_images/finaltree.png

The GrapeTree here is annotate/colour-coded with the HierCC d_50 groupings. To do this on your own Tree, do the following:

  • Under the EnteroBase tab, click Import Fields.

  • In the Add Columns dialog:

    • Experiment should be HierCC and Column should be d_50. Click Add.
    • ‘d_50(HierCC)’ should be added to the list of columns to import (on the right).
    • Click OK
    ../_images/tree3.png

This should update the GrapeTree with the d_50 groups. The key is labeled with the HierCC cluster ID; 728 & 14851, which we found in the previous section. You can clearly see the long branch (114 alleles) seperating the two groups.

Searching based on HierCC

The HierCC Cluster ID are searchable in EnteroBase if you want to quickly revisit a group of strains you found before. Using the Malawi example:

  • Use Search strains

  • Under Experimental Data:

    • Experiment Type should be HierCC
    • Data Type should be d_20 where ST equals ‘728’.
  • Ignore legacy data should also be checked.

  • Click Submit

    ../_images/recover1.png

Some of the search results are shown below:

../_images/recover2.png

There is a short-cut for this process shown below. Right-clicking on the HierCC result for a given level will allow you to quickly search for strains in a given cluster.

  • On a particular cell in HierCC results:
    • Right-click
    • Click on Get at this level

In the example below, I clicked on d_5:729 so the search results will be updated with all strains which are in cluster 729 (using d_5 as the threshold).

../_images/recover3.png