Adding a new Database
=====================
Creating the SQLAchemy Model
----------------------------
In entero/databases create a new folder (python package) with the name of
your database.Then add the four files shown below (you can copy them from
another database directory)

.. code-block:: html

    entero
    │      
    └───databases
        │   
        └───<database_name>
        │      data_params.txt
        │      db.py
        │      models.py
        │      __init__.py
        │  
        └───senterica
        │  
        
.. code-block:: python

    \_\_init\_\_.py
    The standard __init__ file for any python package
      

.. code-block: python

    from db import *
    ```
    ### models.py ###
    This contains all the table classes, which inherit most memebers (columns) from the base classes in entro/database/generic_models.py, although you are free to add any extra. If you add any new columns than you need to also describe them in data_params.txt . Below shows an example where a new column 'species' is added
       
    ```
    from sqlalchemy import Integer, Column, ForeignKey, Table, String, DateTime, Time, Text, MetaData
    from sqlalchemy.ext.declarative import declarative_base
    from entero.databases import generic_models as mod
    
    metadata = MetaData()
    Base = declarative_base(metadata=metadata)
    
    TracesAssembly = mod.getTraceAssemblyTable(metadata)
    
    
    class Assemblies(Base, mod.AssembliesRel):
        pass
    
    class AssembliesArchive(Base, mod.AssembliesArchive):
        pass
    
    class Traces(Base, mod.Traces):
        pass
    
    class TracesArchive(Base, mod.TracesArchive):
        pass
    
    class Schemes(Base,mod.Schemes):
        pass
    
    class SchemesArchive(Base,mod.SchemesArchive):
        pass
    
    class Strains(Base,mod.Strains):
        #add any database specific fields here
        species = Column("species",String(200))
           
    class StrainsArchive(Base,mod.StrainsArchive):
        #duplicate any added  strain fields here 
        species = Column("species",String(200))
        
    class DataParam(Base,mod.DataParam): 
        pass
    
    class AssemblyLookup(Base,mod.AssemblyLookup):
        pass
    ```
    
    ### db.py ###
    The database class inherits all functions and members from AbstractDatabase entero/database/database.py and usually no new methods have to be added
    ```
    from ..database import AbstractDatabase
    import datetime
    from models import *
    import os
    from sqlalchemy.sql import func, exists
    from sqlalchemy import and_, select, or_
    from entero import app
    from sqlalchemy.exc import IntegrityError
    
    class DB(AbstractDatabase):
        #add any database specific methods here
        pass
    
    ```

dataparams.txt
^^^^^^^^^^^^^^
This is a tab delimited text file that should contain information about extra
fields that you specified in models.py The fields are described in
:doc:`developer-metadata-validation`. Below is an example for an extra field
strains, which will try and pull the value from the sra data and users are
limited to only three values (please note ':' in vals should be '|' , but I
could not escape this character in a markdown table)

+---------------+--------+-----------------------+------------------+----------------+-------------+------------+-----------------------------------------------------------------------+------------+
| **tabname**   |**name**|**sra_field**          |**display_order** |**nested_order**|**label**    |**datatype**|**vals**                                                               |**required**|
+---------------+--------+-----------------------+------------------+----------------+-------------+------------+-----------------------------------------------------------------------+------------+
| strains       |species |Sample,Metadata,Species|7                 |0               |Species      |combo       | Cronobacter sakezakii:Cronobacter dublinensis:Cronobacter universalis |           0|
+---------------+--------+-----------------------+------------------+----------------+-------------+------------+-----------------------------------------------------------------------+------------+


Adding the database in the config
---------------------------------
In entero/__init__.py add an item in the ACTIVE_DATABASES dictionary with the
name of the database as key and an array(list) containing the following

* The name of the Genus - this is important as it is used to retrieve the 
  appropriate records from the SRA and to check whether assemblies are of the 
  correct taxa
* The url of the database
* A boolean showing whether the database is public (True) or private (False)
* Whether the database is active (1) or not (0) .Initially while we are creating 
  the database, this should be set to 0
* The three letter code identifying the database

e.g.

.. code-block:: python

    'cronobacter': [
                  'Cronobacter',
                  'postgresql://%s:%s@%s/cronobacter'%(USER, PASS, POSTGRES_SERVER),
                   True,
                   0,
                   'CRO'
                   ]
    
    
Also in config.py the DB_CODES codes dictionary needs to be updated with the
database name as the key and a list comprising of a name and the code e.g

.. code-block:: python

    'cronobacter' : ['cronobacter','CRO']


Creating the Actual Database
----------------------------
First of all the database must be physically created which can be done with
postgresql (right click on databases > New Database..) then just type in the
name and press OK or via posttgresql command line by

.. code-block:: sql

    CREATE DATABASE <database name>
      WITH OWNER = <owner_name>
           ENCODING = 'UTF8'
           TABLESPACE = pg_default
           LC_COLLATE = 'en_GB.UTF-8'
           LC_CTYPE = 'en_GB.UTF-8'
           CONNECTION LIMIT = -1;

This must be done using the postgres admin account (e.g. postgres) e.g. by running psql as follows

.. code-block:: bash

    sudo -su postgres psql -p<portno>  

Next it can populated with tables, based on the models using the the script
[create_new_database](Maintenance Scripts#markdown-header-create_new_database)

.. code-block:: bash

    python manage.py create_new_database -d cronobacter


This script also populates the data_param table and adds a few schemes which
are generic to all databases

* **assembly_stats** This will display information about the assembly, the fields for this scheme have been added to data_param
* **ref_masker** this scheme is not displayed to the user, but contains information about repeats in a genome and is used when calling SNPS
* **snp_calls** Again, this scheme is not displayed but will store information abut snps called for a strain against a particular reference
* **prokka_annotation** Contains information pointing to the annotation files (gff and genbank) for genomes. Again the fields for this scheme have been added to the data_param table

**N.B.** Now change the entry in ACTIVE_DATABASES in config.py to make the
database active (change the fourth value to 1)

Populating the Database
-----------------------
Insert a scheme hosted in NServ
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can insert a scheme that has been defined in NServ into the database using the script "manage.py add_nserv_scheme".

Insert a rMLST scheme for Cronobacter database. 

.. code-block:: bash

    python manage.py add_nserv_scheme -d cronobacter -s rMLST -r Cronobacter_rMLST

Insert a wgMLST scheme for Cronobacter, and grant access to some users specified in '-u' parameter

.. code-block:: bash

    python manage.py add_nserv_scheme -d cronobacter -s wgMLST -r CROwgMLST_wgMLST -u user_name_1,user_name_2


Importing SRA Data
^^^^^^^^^^^^^^^^^^
You can populate the database using the script [importSRA](Maintenance
Scripts#markdown-header-importsra) . To import all SRA records for the last
10 years (3000 days)

.. code-block:: bash

    python manage.py importSRA -d cronobacter -r 3000

Assembling Genomes
^^^^^^^^^^^^^^^^^^
In order to assembly genomes you can either do it through the web interface
or run the script [update_assemblies](Maintenance
Scripts#markdown-header-update_assemblies), which will check for any
un-assembled strains and send them off for assembly . e.g. To send two
strains for assembly with high priority.

.. code-block:: bash

    python manage.py update_assemblies -p -9 -d cronobacter -l 2

Calling Schemes
^^^^^^^^^^^^^^^
Also schemes need to be called for each assembly, again this can be done
through the web interface or using the script
[update_all_schemes](Maintenance Scripts#markdown-header-update_all_schemes)
(in this case probably only prokka annotation will be called as it is the
only scheme in the database)

.. code-block:: python

    python manage.py update_all_schemes -d cronobacter -p 9


To automate the import, assembly and calling of schemes , you can add the
name of the database to the shell scripts scripts/daily_update.sh and
scripts/daily_import.sh see [here](Overall Structure#markdown-header-scripts)


Import whole genome
^^^^^^^^^^^^^^^^^^^
Imports assembled genomes from genbank

.. code-block:: python

    python manage.py import_whole_genome -d cronobacter

only complete genomes will be imported unless you add (-c False) to the command