EnteroBase API Implementation

Enterobase supports two APIs:

  • api_1: Used for the internal communication between the client GUI and the enterobase server for performing operations such as uploading data
  • api_2: HTTP based protocol for interaction between client apps and the enterobase server

API_1

This is a restful API built around the flask_restful python library and is used to handle certain interactions between the GUI and the enterobase server. Note that the structure and architecture is influenced by earlier implementations of the enterobase architecture. Functions handled include:

  • Uploading new and modified metadata: upload.py
  • Initiating assemblies: AssembleAPI Jobs.py
  • Initiating pipelie jobs: NservAPI Jobs.py - Note that the method name is confusing. This handles all types of pipeline jobs which are initiated on enterobase-web and passed to CRobot, which in turn makes use of NServ for nomenclature jobs

API_2

Api_2 is based arount the Marshmallow Python library https://marshmallow.readthedocs.io/en/stable/examples.html that allows schemas to be defined that define the external access to internal resources, such that the query is in the form of an http request and the response is json formatted. The conceptually simplest use of Marchmallow is to define external access to database tables, where the parameters correspond to the columns within the table and there are schema defnitions that define:

  • the data types of the column
  • mappings from external identifiers to database column ids (attribute =)
  • The default is that parameters can be used in filters and data is returned, but they can be set to only be usable in filters (load_only= True) or only used to return data (dump_only = True)

This approach allows a user interface to be dynamically generated using the swagger user interface (https://swagger.io/tools/swagger-ui/) which is is part of the flask-apispec library and which contains all of the javascript files required to bring the UI to life: (https://enterobase.warwick.ac.uk/api/v2.0/swagger-ui)

The schemas are defined in api_2_0/schemas.py, and are of the following general form, e.g. for accessing data from the strains table:

 class StrainSchema(GenericSchema):
     created = ma.fields.DateTime(dump_only=True)
     comment = ma.fields.String()
     strain_name = ma.fields.String(attribute = 'strain')
     lab_contact = ma.fields.String(attribute = 'contact')
     source_niche = ma.fields.String()
 ...
def get_description(self, field_name):
     description = dict(
         database = 'Species database name (senterica, ecoli, yersinia, mcatarrhalis) for Salmonella, Escherichia, Yersinia, Moraxella respectively',
         barcode = 'Unique barcode for Strain records, <database prefix>_<ID code> e.g. SAL_AA0001AA',
 ...

The schemas are implemented in api_2_0/resources.py which provide all of the information required to implement the schema. In the case of the above schema this is implemented with the following class that shows that it implements the StrainsSchema based on the table Strains.

class StrainsResource(AbstractListResource):

    schema = StrainSchema()
    table_name = 'Strains'

    def __init__(self):
        self.table_name = 'Strains'
        self.schema = StrainSchema()
        self.name = 'strains'
        self.description = 'Strain metadata'

These classes include common code for generating the information used by swagger to generate the user interface

def update_doc(self, docs):
    doc_string = docs.spec._paths['/api/v2.0/{database}/%s/{barcode}' %self.name]
    for method in iter(doc_string.keys()):
        doc_string.get(method)['tags'] = [self.name.title()]
        doc_string.get(method)['description'] = self.description
        doc_string.get(method)['responses'] = self.__format_responses(method)

Note that in providing the URL (docs.spec._paths) it indicates which values (database and barcode in this case) are required in order to use this specific API, which is then annotated as such in the swagger user interface

The classes contain a lot of table and parameter specific code to deal with the idiosyncracies of individual parameters. They inherit from the MethodResource class in flask_apispec as follows:

  • LoginResource <- LoginSchema

  • TopResource <- TopSchema

  • AbstractResource

    • StrainResource <- StrainSchema
    • TraceResource <- TracesSchema (Traces)
    • AssemblyResource <- AssembliesSchema (Assemblies)
    • SchemeResource <- SchemeSchema (Schemes)
  • AbstractListResource
    • StrainsResource <- StrainSchema
    • StrainsVersionResource <- StrainSchema (StrainsArchive)
    • TracesResource <- TracesSchema (Traces)
    • AssembliesResource <- AssembliesSchema (Assemblies)
    • SchemesResource <- SchemeSchema (Schemes)
  • LookupResource <- LookupSchema (lookup)

  • LookupListResource <- LookupSchema (lookup)

  • NServResource
    • LociResource <- LociSchema (loci)
    • AllelesResource <- AllelesSchema (alleles)
    • StsResource <- StsSchema (STs)
  • StrainDataResource <- StrainDataSchema (straindata)

Most of the resource code queries the enterobase database, but the NServResource resources compose a query that is passed to NServ using an HTTP request and processes the json data that is returned, using the following core code.

post_string = self.SERVER + '/search.api/%s/%s/%s' % (nserv_db_name, nserv_scheme, self.table_name)
...
try:
    response = requests.post(post_string, data=params,timeout=app.config['NSERV_TIMEOUT'])