Setting a new Database Structure

The Active database are specified in the ACTIVE_DATABASES variable in config.py and consist of a dictionary where the name of the database is the key which points to an array containing the following

  • The name of the Genus - this is important as it is used to retrieve the appropriate records from the SRA and to check whether assemblies are of the correct taxa
  • The url of the database
  • A boolean showing whether the database is public (True) or private (False)
  • A number ?
  • The three letter code identifying the database

e.g.

'senterica': [
              'Salmonella',
              'postgresql://%s:%s@%s/senterica'%(USER, PASS, POSTGRES_SERVER),
               True,
               1,
               'SAL'
               ]

Adding a Column to the strains Table

  • Add the column to the actual strains and strains_archive table in the database
  • Add the column to Stains and StrainsArchive classes in the SQLAclchemy models located at entero/databases/<database_name>/models.py
class Strains
   new_column=Column("new_column",String(100))
class StrainsArchive
   new_column=Column("new_column",String(100))
  • Add the column description to the data_param table. If it corresponds to metadata in the SRA then fill in the sra_field with the appropriate json path e.g. Sample,Metadata,Species

You can retrospectively add data to the column using the script update_sra_fields

Example Adding Geographic details

tabname name sra_field order nested_orderlabel datatype   groupname
strains geographic_details Sample,Metadata,geography details, 5,9,text,Location

class Strains(Base,mod.Strains):
geographic_details=Column("geographic_details",String(100))