Maintenance Scripts

These scripts are in manage.py (in the top level directory) and can be run (assuming you are in the Enterobase virtual environment and in the top level directory) using the following syntax

..code-block:: bash

python manage.py script_name parameters

e.g.

..code-block:: bash

python manage.py check_queued_jobs -d ecoli -s MLST_Achtman

Database Scripts

create_new_database

Adds the rMLST scheme to the database

  • -d –database -The name of the database (required)
  • -c –create_schemes Whether to create generic schemes (e.g. assembly_stats, snp_calling etc.) default is True

backup_database

backs up the database in postgresql custom format in the specified folder under a subfolder with the current date

  • -f –folder - The folder in which to create the backup (a folder with the current date will be created in this folder)
  • -d –database - The name of the database to back up. By default all active databases are backed up
  • -s –system - By default the system database is backed up. Set this parameter to False if you do not want this behaviour

Scripts For Jobs

Schemes

update_all_schemes

Will check all complete assemblies that have passed QC and will check whether all schemes have been called on them or are queued. If not, any outstanding job will be sent.

  • -d –dbName -The database (default senterica)
  • -l –limit - The maximum number of jobs to send per scheme(default 100)
  • -f –force - If True,T,t or true then jobs which have failed more than 5 times will be sent (default false)
  • -q –queue - The queue in which the jobs will be placed (default backend)
  • -p –priority - The priority of the jobs (between 9 low and -9 high) (default 0)

update_scheme

Will check all complete assemblies that have passed QC and will check whether the specified scheme has been called on them or are queued. If not, any outstanding job will be sent.

  • -d –dbName -The database (default senterica)
  • -l –limit - The maximum number of jobs to send per scheme(default 100)
  • -s –scheme - The name of the scheme (default rMLST)
  • -f –force - If True,T,t or true then jobs which have failed more than 5 times will be sent (default false)
  • -q –queue - The queue in which the jobs will be placed (default backend)
  • -p –priority - The priority of the jobs (between 9 low and -9 high) (default 0)

check_queued_jobs

Forces callbacks on all jobs that are currently queued for the scheme specified.

  • -d –database -The database (default senterica)
  • -s –scheme - The name of the scheme (default rMLST)

kill_duplicate_jobs

Will try and find all duplicate nserv jobs i.e. for jobs for the same assembly/scheme combination. Will remove duplicate entries from the assembly lookup and kill the associated job.

  • -d –database -The database (default senterica)
  • -s –scheme - The name of the scheme (default rMLST)
  • -k –kill - If False then the the entry will only be removed from the database, the job will not be killed

(default True)

Assemblies

update_assemblies This script will check for any strains in the database which do not have an assembly and are capable of being assembled e.g has paired Illumina reads. Assemblies which have failed more than 5 times will not be resent.

  • -d –dbName -The database (default senterica)
  • -l –limit - The maximum number of assemblies to send (default 100)
  • -f –force - The assembly will not be sent if the number of failures is greater than this number (default 5)
  • -q –queue - The queue in which the assemblies will be placed (default backend)
  • -p –priority - The priority of the assemblies (between 9 low and -9 high) (default 0)

check_queued_assemblies Forces callback on all assemblies that currently queued to check if the callback on any that have completed was missed

  • -d –database -The database (default senterica)

General

runcelery

Runs celery

  • -t –threads -The number of threads (default 1)
  • -q –queue -The job queue (default celery)

update_job Forces callbacks on the specified job(s)

  • -j –job - The number of the job or multiple job numbers separated by commas

change_job_priority Changes the priority of the specified job(s)

  • -j –job - The number of the job or multiple job numbers separated by commas (default 0)
  • -u –user - Change the priority of all jobs submitted from this user (default none)
  • -p –priority - The new priority value (default -9)

Scripts For Importing Data

import_whole_genome

Imports assembled genomes from genbank

  • -d –database - The name of the database
  • -t –term - Key word which will identify the assemblies. Can be specific e.g the accession number or broader such as the species name (default none - all the assembled genomes for the database’s genus will be imported)
  • -c –complete - If set to T,True or true only complete genomes will be imported - not contigs or scaffolds (default True)

load_user_reads

This script will check which reads from a particular user need uploading and attempt to copy the read files from the specified folder (either local or ftp) and initiate all the analysis if successfully copied

  • -d –database - The name of the database
  • f –folder - The folder, either local or remote which contains the user’s read files
  • -r –remote - If True, then the folder is a remote ftp folder (default False)
  • -s –settings - The details of the FTP site in the following format address,user_name,password (default - my ebi drop box details)
  • -u –user - The username of the user

update_sra_fields Updates all records from metadata taken from the SRA. Can be used to retrospectively add data to a new colum

  • -d –database - The name of the database
  • -f –fields - The field or fields (comma separated) to update

import_sra_data Imports data for a particular sample,accession or project ID

  • -d –database - The name of the database
  • -p –project - The ID of the project - should be the sra project ID e.g. ERP020979, not the BioProject ID
  • -s –sample - The ID of the sample (or a comma delimited list of IDs)
  • -a –accession The accession (run) ID (or a comma delimited list of IDs)

importSRA Imports data from the SRA, either from a file or all new entries since a specified date

  • -r –reldate - Integer - Import all records from the last X days e.g. -r 30 will import all Short Reads that are not already in Enterobase that were added to the SRA in the last 30 days (default 7)
  • -f –file_loc - If records are to be loaded from a file and not directly from the SRA, then this parameter should specify the location of the file (json format)
  • -d –db - The name of the database
  • -l –live If True then the task will be run via Celery (default False)

Scripts For JBrowse

make_jbrowse_annotation

Will create all the necessary files and configs for displaying an assembly in Jbrowse. For Genbank files a track for the GenBank annotation will be created. For in house assemblies, a track showing the quality of each base will be generated (based on the fastq file). Also tracks for prokka annotations and all schemes in the database will be created as well as a GC content track.

  • -d –database - The name of the database
  • -b –barcode - The barcode of the assembly to be annotated
  • -f –force - If True then a current annotation will be overwritten. If False and there is already an annotation nothing will be done.Default is false.