LIN® codes
(trademark registered by This Genomic Life, Inc, Floyd, VA, USA)
Enterobase is able to present the cgMLST clusters using the Life Identification Method (LIN) method which was recently introduced to provide a multilevel taxonomy system.
The general principle of LIN codes is described here: A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature.
A description of its implementation based on cgMLST schemes is here Life Identification Numbers: A bacterial strain nomenclature approach.
The algorithm for allocating LIN codes described in this paper and implemented here is identical to that implemented by the hierarchical clustering (hierCC) algorithm implemented on Enterobase such that it has been possible to create mapping between the hierCC values previously created, and still being created, on Enterobase and equivalent LIN codes. As simple picture of the way that the mapping works is shown below
Note that some publications describing the implementation of cgMLST LIN codes imply that the clusters defined by the EnterBase hierCC algorithm are not permanent identifiers, in the mistaken belief that clusters can be merged. This arose as a result of a misunderstanding about the way that an initial set of clusters are defined when creating a hierCC scheme, and no cluster merging occurs when the hierCC occurs once the first set of hierCC values, and therefore any associated LIN codes, are made public.
Displaying LIN codes on the GUI
LIN codes are currently supported for the Salmonella and E. Coli species and in each case can be viewed using the ‘LIN codes’ experimental data tab.
The first column shows the complete LIN code and subsequent columns show LIN code clusters at various levels. For example in Salmonella, the first ‘cluster’ column shows the first three identifiers of the LIN codes. Strains with identical LIN codes at this level will have identical HC2000 heirCC identifiers, such that 0-0-0 is equivalent to HC2000 = 2. It is possible to right click and use ‘Get at this level’ to find all of the strains with the same LIN code cluster identifier in the same way as this is supported for hierachical clustering. Strains can be sorted by cluster identifier, although this is a lexigraphical ordering rather than a LIN code numerical ordering.
Searching for LIN codes
The search GUI allows to search for a specific LIN code, or for specific text within a LIN code. It also allows for searching for LIN codes based on a LIN code cluster by entering the first N identifiers of a cluster, e.g. searching for a LIN cluster 0-0-0-0-0-0-41 will return all samples whose LIN codes start 0-0-0-0-0-0-41.
LIN codes and the API
On Enterobase LIN codes are associated with cgMLST STs, so downloading data associated with a cgMLST scheme using /api/v2.0/{database}/{scheme}/sts will also return the LIN code for each ST.
Grapetree and Bubble Plots
Grapetrees can be produced and labelled based on LIN codes, although the 1:1 mapping between LIN codes and hierCC identifiers means that a LIN code based grapetree is identical to that of a hierCC based Grapetree. In principle it should be possible to generate Bubble plots using LIN code data, but this is not currently possible. It is possible however to generate a Bubble plot using hierCC data and then label it using the LIN code values. In the same way as it is possible to do this for any other data held on Enterobase.
® Life Identification Number and LIN are registered trademarks of this Genomic Life Inc. LSH and BAV report in accordance with Virginia Tech policies and procedures and their ethical obligation as researchers, that they have a financial interest in this Genomic Life Inc. that may be affected by the publication of this manuscript.