Getting started with Enterobase API

All API activity needs to requested through HTTP Basic Authentication and authenticated with a valid token. You must have an account on the EnteroBase website to get your Token.

If you have API access to a database

Your API token should be displayed under ‘Important information’ in the main database dashboard for the species to which you have been given API access and can be copied by clicking on the icon circled in red.

To access a database dashboard from the main page of https://enterobase.warwick.ac.uk click the panel for the appropriate species. It is necessary to scroll to the bottom of the page to see the ‘Important information’.

If you DO NOT have API access to a database

Make sure you have an account at https://enterobase.warwick.ac.uk
Email us at enterobase@warwick.ac.uk with a message something like:

Hi,

I am [YOUR NAME] at [YOUR ORGANISATION]
and I would like access to the EnteroBase Database
[DATABASE OF INTEREST; Salmonella, E. coli, Clostridioides,
Yersinia, Moraxella].

I would like to use the API to [YOUR INTENDED PURPOSE]. This is for a
[COMMERCIAL/ACADEMIC] project about [ONE SENTENCE DESCRIPTION OF YOUR PROJECT].

My username/Email on Enterobase is [YOUR ENTEROBASE USERNAME OR EMAIL YOU USED]

Thanks


[YOUR NAME]

We will get back to you promptly about your request.

Testing out your Token

Once you have your Token you can start using it in scripts to download data from EnteroBase. Here is a simple example that just picks out one assembled strain record using curl:

curl --header "Accept: application/json" --user "<YOUR_TOKEN_HERE>:" "https://enterobase.warwick.ac.uk/api/v2.0/senterica/straindata?sortorder=asc&assembly_status=Assembled&limit=1"

You can build requests like this into Python, with a simple example of the same request below:

from urllib.request import urlopen
from urllib.error import HTTPError
import urllib
import base64
import json

API_TOKEN = 'YOUR_TOKEN_HERE'

def __create_request(request_str):
    base64string = base64.b64encode('{0}: '.format(API_TOKEN).encode('utf-8'))
    headers = {"Authorization": "Basic {0}".format(base64string.decode())}
    request = urllib.request.Request(request_str, None, headers)
    return request

address = 'https://enterobase.warwick.ac.uk/api/v2.0/senterica/straindata?assembly_status=Assembled&limit=1'

try:
    response = urlopen(__create_request(address))
    data = json.load(response)
    print (json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))

except HTTPError as Response_error:
    print ('%d %s. <%s>\n Reason: %s' %(Response_error.code,
                                                      Response_error.reason,
                                                      Response_error.geturl(),
                                                      Response_error.read()))

Note that this script is for accessing data from the senterica database. If you have not been given access to the senterica database but have been given access to some other database then it will need to be modified accordingly, ie by changing ‘senterica’ in the address with the identifier of the database to which you have been given access.

The main important steps to remember are:

Send your request, usually a GET, with the token added to authorization (Basic) header.
Data will usually come back in JSON so use a module to cast it to a dictionary.

This is the kind of result you get back. It is usually in JSON, which can be easily treated like a dictionary. Most responses are structured like this:

links
- paging; links to the previous/next page of data, like webpage pagination.
- Number of records on this page (total_records).
- Total number of records (total_records).
data, labelled after the endpoint you’ve fetched, in this case ‘straindata’.
- (Data for this record…)

{
    "links": {
        "paging": {
            "next": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/straindata?limit=1&assembly_status=Assembled"
        },
        "records": 1,
        "total_records": 367716
    },
    "straindata": {
        "SAL_FA6876AA": {
            "assembly_barcode": "SAL_LA1140AA_AS",
            "assembly_status": "Assembled",
            "city": null,
            "collection_date": 18,
            "collection_month": 5,
            "collection_time": null,
            "collection_year": 2016,
            "comment": null,
            "continent": "Oceania",
            "country": "Australia",
            "county": null,
            "created": "2016-05-18T09:00:04.867617+00:00",
            "download_fasta_link": "https://enterobase.warwick.ac.uk/upload/download?assembly_barcode=SAL_LA1140AA_AS&database=senterica",
            "email": null,
            "lab_contact": "Vitali Sintchenko",
            "lastmodified": "2016-09-06T22:55:53.045179+00:00",
            "latitude": null,
            "longitude": null,
            "n50": 442480,
            "orderby": "barcode",
            "postcode": null,
            "region": "New South Wales",
            "secondary_sample_accession": null,
            "serotype": "Enteritidis",
            "source_details": "NSW ERL",
            "source_niche": "Human",
            "source_type": "Laboratory",
            "strain_barcode": "SAL_FA6876AA",
            "strain_name": "NSW29-074",
            "top_species": "Salmonella enterica;100.0%",
            "uberstrain": "SAL_FA6876AA",
            "version": 1
        }
    }
}

More sample scripts are available at: https://bitbucket.org/enterobase/enterobase-scripts/

Understanding the EnteroBase API Structure

The API tends to follow the logical structure for MLST and NGS data in general. e.g. Strains > Traces > Assemblies, and Loci > Alleles > Sequence types (STs).

There are generic query methods such as Lookup and Info that will help with straightforward lookups of information. If you have any suggestions for new endpoints to help your work, please let us know.

The Swagger sandbox

Swagger is an API framework used in the EnteroBase API. It provides interactive documentation of the EnteroBase API, including information about endpoints, inputs, outputs and response codes.

Link to Enterobase’s interactive API documentation (swagger-ui): https://enterobase.warwick.ac.uk/api/v2.0/swagger-ui

You can just right into playing with requests. There is a demo token already embedded, which is only valid for the Salmonella database (‘senterica’).

Understanding Barcodes

Almost all data in EnteroBase is assigned a unique Barcode. This is a unique identifier across all of EnteroBase. It follows a very straightforward structure, split by underscores: ** SAL_AA0019AA_ST **

The first part (e.g. SAL) defines the database
The middle encodes an ID number, letters are used to allow more information per character similar to a UK postcode (CV4 7AL).
The last part defines the datatype (e.g. ST is Sequence Type record).

Databases are encoded:

Genus	Tag
Salmonella	SAL
Escherichia	ESC
Yersinia	YER
Clostridium	CLO
Moraxella	MOR

Datatypes are encoded:

Datatype	Barcode tag
Schemes	SC
Loci	LO
Alleles	AL
Assemblies	AS
STs	ST
Traces	TR
Strains	None or SS

Rapid Barcode lookup

Some barcodes can be quickly looked up using the Lookup endpoint in the API. A request is as simple as :

https://enterobase.warwick.ac.uk/api/v2.0/lookup?barcode=SAL_AA0019AA_ST

This gives you a full information on the record, which in this case is about Sequence Type 19 (Salmonella Typhimurium).

{
  "records": 1,
  "results": [
    {
      "ST_id": 19,
      "accepted": 1,
      "alleles": [
        {
          "allele_id": 12,
          "locus": "hemD"
        },
        {
          "allele_id": 10,
          "locus": "aroC"
        },
        {
          "allele_id": 2,
          "locus": "thrA"
        },
        {
          "allele_id": 9,
          "locus": "hisD"
        },
        {
          "allele_id": 9,
          "locus": "sucA"
        },
        {
          "allele_id": 7,
          "locus": "dnaN"
        },
        {
          "allele_id": 5,
          "locus": "purE"
        }
      ],
      "barcode": "SAL_AA0019AA_ST",
      "create_time": "2015-11-24 19:59:36.295460",
      "index_id": 19,
      "info": {
        "hierCC": {
          "d1": "1",
          "d3": ""
        },
        "lineage": "",
        "predict": {
          "serotype": [
            [
              "Typhimurium",
              21061
            ],
            [
              "Typhimurium Monophasic",
              3583
            ]
          ]
        },
        "st_complex": "1",
        "subspecies": ""
      },
      "lastmodified": "2020-05-10 08:44:14.985690",
      "lastmodified_by": "zhemin",
      "reference": {
        "lab_contact": "DVI",
        "refstrain": "9924828",
        "source": "mlst.warwick.ac.uk"
      },
      "scheme": "UoW",
      "scheme_index": 1,
      "type_md5": "77cd2d2d-5d80-3e0d-dc4d-194a9dff2c14",
      "version": 7013
    }
  ]
}

You can play around with this feature in the [interactive documentation](https://enterobase.warwick.ac.uk/api/v2.0/swagger-ui)

Navigating the API

There is a top-level API page which can be used to find the direct links to the various endpoints for a given database. The [interactive documentation](https://enterobase.warwick.ac.uk/api/v2.0/swagger-ui) is derived from the API itself and is always up-to-date.

You can use either resource if you are lost.

https://enterobase.warwick.ac.uk/api/v2.0

[
  {
    "description": "Salmonella",
    "links": {
      "assemblies": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/assemblies",
      "current": "https://enterobase.warwick.ac.uk/api/v2.0",
      "schemes": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/schemes",
      "straindata": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/straindata",
      "strains": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/strains",
      "sts": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/rMLST/sts",
      "traces": "https://enterobase.warwick.ac.uk/api/v2.0/senterica/traces"
    },
    "name": "senterica",
    "prefix": "SAL",
    "tables": "{'schemes': 'SC', 'snps': 'SN', 'alleles': 'AL', 'assemblies': 'AS', 'loci': 'LO', 'taxondef': 'TA', 'strains': 'SS', 'datadefs': 'DE', 'sts': 'ST', 'traces': 'TR', 'archive': 'AR', 'refsets': 'RE'}"
  }
]