Analysis objects - Workspaces
=============================
Analysis objects (sometimes called workspaces) are items in Enterobase such
as trees, custom columns/views etc. A workspace (basically just a list of
strains) is also an analysis object (just to confuse matters).

Data for the analysis is stored in the user_preferences table. The id of the
entry in this table is the id used to get a handle on the analysis and is
used in urls and progamatically e.g. get_analysis_object(23233)

Description in the Config
-------------------------
All analysis objects need to be described in the ANALYSIS_TYPES dictionary of
the top level config. The following keys

* **label** The human readable label
* **icon** The icon used for the analysis type when listing workspaces
* **shareable** If True, the object can be shared
* **parameters** A dictionary of parameters which will describe the analysis. In the format of parameter_name:parameter_label. This way the label can be changed without effecting any of the code.
* **Class** The class used to manipulate the analysis. The class should be in entero/ExtraFuncs/workspace and inherit from the base Analysis class
* **url** The url for analysis types, which can be shown in a stand alone web page. They can include <database>, where the name of the database will be inserted in the final url and <id> , where the id of the analysis being shown will be substituted e.g. *species/<database>/snp_project/<id>*
* **job_required** If True will indicate that a job is required to create the analysis. The object's data should have a 'complete' or 'failed' tag once the job is complete
* **create_from_search** If True, the the analysis can be created from the main search page e.g  Trees

Other parameters specific to an individual analysis type can also be added.


user_preferences database table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Analysis objects such as workspaces and user preferences are stored in the user_preferences table within the common entero database.  There is a single table that covers all the species within an Enterobase instance.
The id, user_is are self explanatory. The data column will
contain in json format everything describing the analysis. All analysis
objects have the following

* **date_created** The date the analysis type was created
* **date_modified** The date the anlysis type was lat modified
* **data**
    * **description** A short description of the analysis
    * **links** A dictionary of link_name:link_hef
* **complete** 'true' if the analysis job is complete 
* **failed** 'true if the anlysis job has failed
* **job_id** The id of the analysis job
* **name** The name of the workspace
* **type** The type of workspace

Simple workspaces
-----------------
Workspaces are stored in the user_preferences system database with all the
associated data as json in the data columns json in the following format:

.. code-block:: python


    {
        grid_params:{....}
        ,data:{....}
    }


The data is comprised of the following

* **experimental_data**  The name (description) of the scheme currently displayed
* **strain_ids** A list of all strain Ids
* **sort_order** A list of the columns used for sorting, each value being a list containing the name of the table with the sort column ('main' for the strain table), the name of the column and either 1 (ascending) or 0 (descending)  
* **current_page** The index of the current page

Some of the various types of simple workspaces include:

* **main_workspace** Used when "Workspace/Save as" is used to save a specific set of strains
* **grid_layout** Saves user's customised layouts of the main and experimental grids. The "name" column saves the type of grid whose layout is being saved.

Storage of users workspace list
-------------------------------
The list of workspaces associated with a user are stored in a workspace of type **workspace_folders**

The format is a dictionary of folders each describing the workspaces they contain, their subfolders and text. All will have a Root folder with an id of RN. For example the following folder structure:

.. code-block:: none:: json

    {
        "folder_id":{
            "id":"folder_id",
            "workspaces":[id1,id2,....],
            "children":["folder2","folder3",.....],
            "text":"folder_name"
        },
        .........
    }

for example the following

.. code-block:: text

    Root
    │
    └───Project 1
    │    big snp tree \\id is 3
    │    all typhi \\id 365
    │
    └───Project 2
          small tree \\id 673

would have the following structure

.. code-block:: json

    {
        "RN":{
            "id":"RN",
            "workspaces":[],
            "children":["j1","j2"],
            "text":"Root"
        },
        "j1":{
            "id":"j1",
            "workspaces":[3,365],
            "children":[],
            "text":"Project 1"
        },
         "j2":{
            "id":"j2",
            "workspaces":[673],
            "children":[],
            "text":"Project 2"
        }
    }

For a users individual folders, each time they are requested
/species/get_user_workspaces (get_user_workspaces in entero.species.views)
This method will retrieve the folder structure and then a remove workspaces
depending on whether they have been deleted or workspaces shared with you
have been deleted/unshared. Any workspaces that are new will be added to the
root folder.

this does not happen for public folders but upon delete they are removed and
added in make_public

The sructure of buddies cannot be altered and ids not stored but is just
created on the fly with user name as folder containing any shared folders

Analysis related workspaces
---------------------------

When an analysis such as a GrapeTree analysis is performed the results are stored as a workspace, allowing it subsequently to be returned to and reviewed/modified

Many of theses types have large amounts of associated data.  These are not stored in the user_preferences table but the data column will contain entries giving links to where the files are stored. As an example, the following shows the data entries for a GrapeTree workspace.  As well as the standard entries as described previously the data_file entry gives the location of the file containing the input and config data for the worksapce and the nwk and profile entries give the location of the output files.  The output files are in the standard location for jobs (in this case job 11583729) and the other non-job related data are in the workspaces directory, under the 7444 user id and 109531 workspace identifier

.. code-block:: json

    {"parameters":{"strain_number":116,"scheme":"cgMLST V2 + HierCC V1","algorithm":"NINJA NJ"},
     "algorithm":"ninja",
     "date_created":"2024-03-12 12:07",
     "date_modified":"2024-03-12 12:07",
     "data_file":"\/share_space\/workspace\/senterica\/7444\/109531\/data.json",
     "job_id":11583729,
     "nwk":"\/share_space\/interact\/outputs1\/11583\/11583729\/tree.nwk",
     "complete":"true",
     "profile":"\/share_space\/interact\/outputs1\/11583\/11583729\/profile.list"}


Folder Structure for storing associated workspace data
------------------------------------------------------

The folder structure for public and individuals are stored in the folder defined by BASE_WORKSPACE_DIRECTORY in config.py, normally /share_space/workspace/.  This contains data for 
both individual's and public workspaces.  The directory structure is

.. code-block::

    .../<species>/<userid>/<jobno>/

where <userid>=0 for public workspaces