Phylotypes Derivation (Escherichia)

The phylotypes pipeline derives phyloptype related information for Escherichia/Shigella.

CleremontType (Clermont Typing)

This Cleremont Type is derived using the July 5th 2019 version of the Clermont Typing application:

https://github.com/A-BN/ClermonTyping/tree/5ae1a2baf7c95bf3794501dba00ffb4182a15b0c

CleremontType (EzClermont)

This Cleremont Type is derived using the August 25th 2018 version of the EzCleremont application:

https://github.com/nickp60/EzClermont/tree/371fce28728ae4e3b60019ab661b4c2f8277bf97

fimh (FimerTyping)

The fimh type is derived using the May 1st 2017 version of FimTyper

https://bitbucket.org/genomicepidemiology/fimtyper/src/29801e567c44354c27bcd1777a4c1c9774f4c51a

Pathovar and Virulence factors

The pathovar and virulence factors are derived using an extension of the process and software described in the paper ‘BlastFrost: fast querying of 100,000s of bacterial genomes in Bifrost graphs <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02237-3>’_

There are two stages to this process:

The first stage is that Blastfrost is used to identify the presence or absence of possible virulence factors based on matches to the sequences in the fasta file

virulence_factors.fasta

The stx1,stx2, eae, ST and LT virulence factors are recognised by the presence of the associated sequence within the assembled genome. There are 23 seqeuences in the fasta file associated with the pInv plasmid and pInv is taken to be present (+) if more than 10 of these sequences are identified in the plasmid. A number of alternative ipaH seqeuences are present in the fasta file and if any of these are identified in the plasmid then the genome is taken to be ipaH positive.

The Shigella species is identied by the Hierarchical Cluster identities based on the following table. The code first looks to see if there is an HC400 match, and if not then an HC1100 match through to HC2350. Where there are multiple entries in a row the first entry is the only entry that is used, the subsequent entries are for inofmration, indicating the higher level clusters within wihich the first cluster sits.

Shigella species HC400 HC1100 HC1500 HC2000 HC2350
Shigella boydii   11429   192  
Shigella flexneri       192  
Shigella 22378 1465 1465 1465  
Shigella flexneri 17342 1465 1465 1465  
Shigella flexneri 45451 1465 1465 1465  
Shigella flexneri 13048 1465 1465 1465  
Shigella flexneri 11341 1465 1465 1465  
Shigella flexneri 11126 1465 1465 1465  
Shigella boydii   1465 1465 1465  
Shigella dysenteriae   4194 1465 1465  
Shigella dysenteriae 45284 1466 1465 1465  
Shigella boydii   1466 1465 1465  
Shigella dysenteriae     36524 1465  
Shigella flexneri       1465  
Shigella boydii   7057 4191 4118  
Shigella boydii 11444 4191 4191 4118  
Shigella dysenteriae     4191 4118  
Shigella boydii       4118  
Shigella boydii       45542  
Shigella dysenteriae     44944 44944  
Shigella boydii       44944  
Shigella sonnei       305  
Shigella dysenteriae   4195      
Shigella dysenteriae       1463  
  1. albertii
        1596
Clade V     36538    
Clade V     48593    

After determining whether the sample is a Shigella species, the code then determines the pathovar identity based on the presence or absence of pathovars using the following logic:

if virulence_factors['ipaH'] == '+':
    pathovar = 'EIEC'
if virulence_factors['ST'] == '+' or virulence_factors['LT'] == '+':
    # ipaH should not be in combination with ST or LT
    pathovar = 'ERROR' if pathovar else 'ETEC'
if virulence_factors['Stx1'] == '+' or virulence_factors['Stx2'] == '+':
    # If Stx is present then it is STEC unless eae is also present in which case it is EHEC
    if virulence_factors['eae'] == '+':
        pathovar = pathovar + ('/' if len(pathovar) else '') + 'EHEC'
    else:
        pathovar = pathovar + ('/' if len(pathovar) else '') + 'STEC'
elif virulence_factors['eae'] == '+':
    # But eae without Stx is EPEC
    pathovar = pathovar + ('/' if len(pathovar) else '') + 'EPEC'
if len(pathovar):
    pathovar = 'E. coli - ' + pathovar
else:
    # Place a dash to indicate basic E. coli
    pathovar = '-'