Searching EnteroBase¶
By clicking through Search Strains you are presented with the Search menu with the data panels below. Once you submit a search, these panels will fill with your search results.
The following is an example of how a search is performed using Salmonella:
- Click the Field dropdown and select Serovar. N.B. A Serovar is just a sub-division of the species based on antigenicity, in Salmonella these are often named after geographic locations.
- In the Value field to the right, type in Newport (a common Salmonella serovar). A dropdown will appear below giving suggestions. You can click on ‘Newport’ in this list or continue typing the full word.
- Click the AND button to add an additional condition to the search.
- Click the 2nd ‘Field’ dropdown and select ‘County.
- In the ‘Value’ field to the right, type in ‘United Kingdom’. Again notice the dropdown that appears below giving suggestions. Click on ‘United Kingdom’ in this list or continue typing the full phrase. This is now a reasonably complicated query searching for strains of serovar Newport that were isolated in the United Kingdom.
- Click ‘Submit’.
The exact data will change as new data would have been added since the preparation of this page. However, you should see a number of strain records with metadata on the left pane and experimental data on the right pane.
The first few rows show records from the legacy MLST database (http://mlst.warwick.ac.uk), and as such they have ‘MLST(legacy)’ as the data source. These data are derived from Sanger Traces and have no NGS data, so a number of assembly statistics and genotyping information is blank. While EnteroBase shows past MLST data, EnteroBase does not accept new data based on Sanger Traces.
The other rows (at the bottom) are derived from sequenced reads from the SRA. Their status shows they’ve been assembled and the data Source shows the SRA accession number.
To return to the search function click ‘Maximise’.
Saving and loading queries
It is a little time consuming to repeatedly enter all the information for complex queries. EnteroBase has a feature where you imortant queries can be saved and loaded on demand.
To save the current query for later:
- Enter an informative query name e.g. ‘Newport_UK’ in the text box right of ‘Save Current Query’ (2).
- Click the ‘Save’ button (floppy disk icon with a down arrow).
Now press the ‘Clear’ button near the bottom right and try loading your query
To load a query:
- Click the dropdown, and select the query required.
- Click the ‘Load button (floppy disk icon with an up arrow).
Advanced query functions¶
There are a number of extra options to enhance your searches:
- ‘Ignore Legacy Data’: Checking this box excludes lagacy data from searches (1)
- ‘Only Editable Strains’: Only shows strains that can be edited. There will be no results if you haven’t uploaded anything to EnteroBase.
- ‘Show failed assemblies’: Show assemblies that have failed the quality control. These are usually hidden. These strains will not have any genotyping or other analysis run on them, but it may be useful to check the assembly statistics and download the contigs to see what went wrong.
- ‘Show sub strains’: Some strains have been grouped together for various reasons (see Section Uberstrains). These are usually hidden from search results but are shown if this is checked.
There are also predefined searches that can be run with one-click, under ‘Predefined Search’ in the top right:
- ‘All Strains’: Fetches all strains records for the whole database. This can be slow on large databases.
- ‘My Strains’: Fetches strains that belong to you.
- ‘Latest XXX’: By setting a number in the number field, the search will fetch the last X number of strain records entered in the database.
Uberstrains¶
Most bacterial isolates/strains in EnteroBase are linked to one set of metadata and one set of genotyping data. However, some entries have two or more sets of genotyping data. For example, EnteroBase includes some strains for which legacy MLST data from classical Sanger sequencing exists in addition to MLST genotypes from genomic assemblies. Similarly, some users have uploaded the same reads to both EnteroBase and SRAs, and both sets of data are present in EnteroBase because it automatically imports all new SRA records. In still other cases, genomes of the same strain have been sequenced by independent laboratories, or multiple laboratory variants have been sequenced that are essentially indistinguishable (e.g. S. enterica LT2 or E. coli K-12).
Such indistinguishable strains can skew analysis the analysis may produce false clusters, which in reality are just the same strain. Thus in Enterobase such entries are merged and a single Uberstrain is created. These are identified by the Uberstrain column at the very left of the strains table. Normally only the Uberstrain is shown and removing the need to worry about this de-replication. However it is still possible to see the sub-strains associated with an Uberstrain (see below)
To do this, when searching the “Show Sub Strains” should be checked in the query dialog (blue box). Where there are multiple strains associated with an Uberstrain these can be viewed using the expand icon in the Uberstrain column (red box). Clicking on this icon will show all the sub strains associated with this master strain. The master strain is usually the most complete, such as if there was a complete closed genome present
Downloading Data¶
The current data can always be saved to file by `Data -> Save To Local File`
.
Because of browser restrictions, the data is actually treated as a download so
it may probably end up in the downloads directory that your browser uses. Some
browsers however let you choose the location. The file is a tab delimited
text file, which can be opened in any spread sheet. The file will contain all
the strain metadata and any associated experimental data in the right hand pane.
For large schemes, this data is not very useful therefore Enterobase enables you
to download all the allele information separately.
As an example, in the Salmonella database query on serovar equals Dublin. By default the experimental data will be assembly stats. In the Experimental Data dropdown choose cgMLST. The data in the right hand panel with then show the cgMLST ST for each record, which is not very useful (but then would you want to look a 3020 columns of allele numbers). Right Click on the right hand panel and select Save all. A dialog will appear showing the progress of retrieving the information (100 records are obtained at one time). Once the data has been retrieved, it can be saved as with any other file (type a file name in the text box and press save).
Downloading Data from species with large numbers of samples¶
There can be problems with downloading metadata and experimental data for large numbers of samples, e.g. if ‘All Strains’ is used with Salmonella. Part of the problem is that Enterobase downloads all of the data to the web browser before prompting for it to be sent to a file. For successful downloads of large amounts of data through the GUI:
- It is important that your computer has sufficient memory. Although it is possible on a PC with 4GByte of memory, more is preferable
- The internet connection should be fast and free from dropouts, such as can occur with poor WiFi connections
- Firefox is recommended. Recent ‘performance improvements’ for Edge and Chrome seem to cause problems with large downloads
- The computer should be restarted and no other programs run while data is being downloaded
- Before doing the initial ‘All strains’ search for a species with a large number of strains, Firefox should be restarted and the default ‘Assembly stats’ experimental data should be used. The results should then be ‘Saved’ to local file
- The other types of experimental data should be selected and ‘Saved’ once the data in the GUI has been updated.
- The GUI will show a dynamic ‘Processing Query’ dialog box while the data is being assembled at the server. The dialog box will appear to freeze while it is being downloaded to the web browser, be patient as for large numbers of strains this can take a while
- When switching to other Experimental data the GUI will again freeze while the data is being downloaded, which will also take some time.
It is not necessary to take all these steps when downloading data for species where there are fewer strains
Editing Metadata¶
To edit records click the edit mode check box. Records with a pencil icon within the column boxed in blue show that you have permission to edit the metadata . Click on any cell to alter its content and the cell should turn yellow. Once you have made all the changes you require you need to upload these changes to the database (press the upload changes icon - red box) or right click on row containing an edited cell and select “Upload Changes in Row.”
After updating changes the cells should turn back to their normal colour and a
dialog will inform you whether the update has been successful.
A Search and Replace (`Edit -> Search and Replace`
) and undo function (`Edit -> Undo`
or ctrl+Z) are available but for large scale editing it may be easier to load the data into Excel.
To do this, perform a query to find the strains to be editted (eg Click on the My strains Icon or `Data -> My strains`
abnd then save them to a local file `Data -> Save to Local
File`
. Open this file in excel and change some values. Then re-save the
file (making sure it is in tab delimited text). Reload the file into Enterobase,
by first clicking on the edit mode check box and then clicking The Load
Modified File icon (purple box). Any changes or errors in your modified file
will be shown as yellow or red cells.
In edit mode you can send jobs / assemblies on any strains that you have editing
rights to by using `Tools -> Assemble Selected`
or `Tools -> Call Scheme for Selected`
.