The Calculation Engine

The Calculation Engine (TCE) is a platform that:

  • Manages a set of bioinformatic tools.
  • Manages a set of calculation nodes.
  • Hosts RESTful APIs for communication with EnteroBase.
  • Queues and distributes user tasks from APIs into calculation nodes.
  • Communicates with RCatch to download SRAs for user tasks.

Components within TCE

TCE implements a number of bioinformatic tools via the parameters and outputs that are documented in pages on the individual tools.

API

TCE URI

In the examples below, TCE URI is configuration dependent, depending on which system TCE runs and on which port the server is configured to listen.

Hello World usage/ welcome message

Visiting the raw URI for TCE i.e. root gives a welcome message.

http://<TCE Host>:<TCE Port>

Supposing, for example, that TCE runs on calculationengineserver.ebbackend.org and listens on port 1234 then this would be:

http://calculationengineserver.ebbackend.org:1234

Submit a new task

Submitting a new task is done via an HTTP POST method using the URL

http://<TCE Host>:<TCE Port>/head/submit

A standard task request in JSON format:

{
    "pipeline": "QAssembly",            # QAssembly; QA_evaluation; QAtoFasta; or nomenclature
    "version": "",                         # Default is the "current version". Otherwise specify a version number
    "inputs": {},                          # Input files for the bioinformatic tool (see details in wiki page for each tool)
    "params": {                            # Parameters for the bioinformatic tool (see details in wiki page for each tool)
      "prefix": "SAL_BA2277AA_AS",
      "scheme": "Senterica_UoW"
    },
    "reads": {                             # Read accessions for the bioinformatic tool (see details in wiki page for each tool)
      "read": "SRR1575956"
    },
    "source_ip": "xxx.xxx.xxx.xxx",        # Domain or IP address for TCE to call back with the subURL specified in the TCE head node ini file under callback_url. Standard is /website_callback
    "usr": "",                             # Not fully tested yet. A usr can specify CPU nodes that are to be used by TCE when this parameter is specified. These are stored in the CR_node.ini file under the usr designation and will be used for private calculationthe, if this field is specified
    "workgroup": "public",                 # One of three queues for assignment to calculation nodes with the same workgroup designation: usr_uploaded, public, or backend. Any other queue names will be assigned to the public queue. Within each queue, jobs can have different priorities between -9 (highest) to +9 (lowest). The queues continue to attempt to assign jobs to the nodes until the job has been accepted for processing. When the public queue is empty, it will also service the other two queues.
    "comment": ""                          # Any information that is specified by users
}

Get information for a task with tag

For example, to get information for a task with the tag 75121 use the URL

http://<TCE>:<TCE Port>/head/show_job/75121

in a GET request.

Additional fields in the JSON response:

{
    "tag":75121,                                                               # job tag, the unique key to retrieve information
    "status": "COMPLETE",                                                      # status can be "WAIT RESOURCE", "QUEUE", "RUNNING", "COMPLETE", "KILLED" or "FAILED"
    "log":"...",                                                               # standard output of the bioinformatic tool. It can be a JSON string describing the outputs of the pipeline (see details in wiki page for each tool)
    "err":"...",                                                               # running message that is generated during the run. Gives error messages if the job status is failed.
    "outputs": {                                                               # files generated by the bioinformatic tool.  (see details in wiki page for each tool)
      "assembly": [
        "SAL_BA2277AA_AS.scaffold.fastq",
        "/share_space/interact/outputs/75/75121/SAL_BA2277AA_AS.scaffold.fastq"
      ],
      "assembly_fasta": [
        "SAL_BA2277AA_AS.scaffold.fasta",
        "/share_space/interact/outputs/75/75121/SAL_BA2277AA_AS.scaffold.fasta"
      ]
    },
    "query_time":"",                                                           # important timestamps for the task
    "launch_time":"",
    "finish_time":""
}

Get information for multiple tasks

Information can be obtained for multiple tasks with a GET request to the URL

http://<TCE Host>:<TCE Port>/head/show_job/75121,75122,75123

or

http://<TCE Host:<TCE Port>/head/show_jobs via HTTP POST with: FILTER = ‘tag between 75121 and 75124’ .

Set the priority for task in queue

The priority for a task in the queue can be set, for example, with a GET request to the URL

http://<TCE Host>:<TCE Port>/head/set_priority?job_tag=1&priority=-9

The lower the value, the higher the priority. By default priority=0.

Kill tasks

Tasks may be killed with a GET request to the URL

http://<TCE Host>:<TCE Port>/head/kill_job/{job_tag_1},{job_tag_2}

or

http://<TCE Host>:<TCE Port>/head/kill_jobs

via HTTP POST with:

FILTER = ‘tag between 75121 and 75124’ .

Show calculation nodes

http://<TCE Host>:<TCE Port>/head/show_nodes

A filter via HTTP POST: FILTER = ‘status in (“RUNNING”)’ is also supported.