diff --git a/README.md b/README.md index b8f593d..abc17f1 100755 --- a/README.md +++ b/README.md @@ -1,193 +1,196 @@ # Desuto & ParaDISE This Docker Compose configuration allows building and running a functional Desuto Web Viewer and Retrieval Interface (including the ParaDISE retrieval engine) from scratch on any Docker-enabled host. This README file aims to describe the structure of the containers and any possible configuration changes that may be required when setting up the system on a new host. ## Prerequisites To run the platform, you will need to install at least: * A recent version of Docker CE : https://docs.docker.com/install/linux/docker-ce/ubuntu/ * A recent version of Docker Compose : https://docs.docker.com/compose/install/ * Preferably a machine with a minimum of **10GB of available RAM** to store the visual indices in-memory ## Containers The ``docker-compose.yml`` file is made up of the following containers: 1. **proxy** : Proxy facade for all services based on nginx 2. **mysql** : MySQL server for storing image metadata (URLs, modalities, captions, etc.) 3. **couchdb** : CouchDgiggingver for storing image annotation data from the Web Viewer 4. **uploads** : Basic nginx instance for serving uploaded images 5. **images** : Basic nginx instance for serving images of the datasets used for retrieval 6. **paradise-gf** : Glassfish Java application server instance hosting the ParaDISE engine 7. **retrieval** : Apache instance hosting the Shambala-based retrieval interface 8. **webviewer** : Node.js server for the Web Viewer / Annotation tool 9. **iipsrv** : IIPImage server for generating the image tiles for the Web Viewer 10. **slideprops** : Python Web Service for extracting slide properties (using Openslide) ## Volumes The following shared volumes are declared: 1. **upload-volume** : Shared volume for uploaded images (served by the **uploads** container) ## Ports The following ports need to be open on the host machine to run all services correctly: 1. **80** : Port 80 is used by the proxy facade to expose all underlying services ## Environment variables The following default environment variables are declared in the ``.env.template`` file. All current values assume that the server runs on ``localhost`` and all services are behind the **proxy** container. -All these variables are injected in various configuration files required by Javascript and Node.js applications. +All these variables are injected in various configuration files required by Javascript, Java and Node.js applications. If no special setup is required on the test host, this file does not need to be modified (apart from the last 2 values) and can directly be copied/renamed to `.env`. 1. **COUCHDB_ADMIN_USER** : Username of the CouchDB administrator 1. **COUCHDB_ADMIN_PASS** : Password for the CouchDB administrator 1. **COUCHDB_DB_NAME** : Name of the database created for the Web Viewer annotation data 1. **COUCHDB_PROTOCOL** : Protocol used by CouchDB (http by default, https can be managed by the proxy) 1. **COUCHDB_PORT** : Default port to reach the CouchDB container (not exposed, but can be used internally by the other Docker containers) 1. **COUCHDB_HOST** : Hostname of the CouchDB server (public-facing, localhost by default) 1. **COUCHDB_ROOT_URL** : Public-facing URL of the CouchDB database 1. **COUCHDB_BACKEND_ROOT_URL** : Backend URL for the CouchDB database 1. **PARADISE_ROOT_URL** : Default URL of the ParaDISE engine 1. **RETRIEVAL_ROOT_URL** : Default URL of the Retrieval Interface 1. **IIP_ROOT_URL** : Default URL of the IIP server 1. **VIEWER_ROOT_URL** : Default URL of the Web Viewer 1. **SLIDEPROPS_ROOT_URL** : Default URL of the Slide Properties Web Service 1. **SLIDEVIEWERDATA_LOCAL_PATH** : Default path of the data for the Web Viewer (uploaded images, converted images, overlays, etc.) 1. **PARADISE_LOCAL_PATH**: Default path of the data for the ParaDISE retrieval system ## Running the platform ### Set up the data folders Before running the platform, you should set up the folder structure for the Web Viewer data, as well as the retrieval system data. #### Web Viewer data The Web Viewer data is organised with the following hierarchy: ``` root |-- converted |-- Files converted automatically by the Web Viewer into the pyramidal, -- Deep-Zoom-compatible TIFF format |-- overlays |-- NAME_OF_UPLOADED_IMAGE.tif |-- feature-name.png |-- uploaded |-- WSI files uploaded via the Web Viewer interface ``` The structure is fairly simple, only the "overlays" directory is a bit special. This folder must contain a subfolder for each WSI uploaded to the platform. Each of these subfolders may contain one or more PNG files representing an overlay for the WSI of that subfolder. #### Retrieval system data The Retrieval system data is organised with the following hierarchy: ``` root |-- all-dmli-pubmed-info.sql This file is an SQL script containing all the info of a figure dataset with captions and modalities, such as the PubMedCentral dataset. The columns are the following: ------------------------------------------------------------- | id | url | thumbnailURL | articleURL | caption | modality | ------------------------------------------------------------- |-- images This folder contains all the patches and images that can be retrieved by the system, organized by dataset and split into several subfolders (by magnification, or other arbitrary levels) Example shown below: |-- Dataset1 (WSI for example) |-- 5X |-- XYZ.tif_idx_0__lvl_3__x2688_y2240.png |-- ... |-- 10X |-- ... |-- Dataset2 (PubMedCentral for example) |-- Subfolder1 |-- Sub-subfolder1 |-- 12-0309-F-3.jpg |-- ... |-- Sub-subfolder2 |-- ... |-- ... |-- paradise-files This folder contains all the files necessary for the ParaDISE backend to function, both for image as well as text retrieval. Details shown below: |-- conf This folder contains configuration files for all the visual indices available in the system, describing the used parameters for indexation, storing mechanism, retrieval settings, etc. Refer to the ParaDISE website (http://paradise.khresmoi.eu/) for more details. |-- pubmedcentral-config.json |-- ... |-- gt This folder contains the ground truth file for the classification algorithm used to automatically compute the image modality of a given image in a dataset. |-- train20XXGT.csv |-- image-lists This folder contains the lists of URLs/paths of images to index. |-- pubmedcentral-dmli.csv |-- ... |-- indices This folder contains CSV files with the features extracted from the images in each dataset |-- wsi-dataset-5x.csv |-- wsi-dataset-10x.csv |-- ... |-- lucene This folder contains the Lucene indices containing the caption and fulltext information for datasets such as PubMedCentral. |-- pubmedcentral-captions-2016-dmli |-- pubmedcentral-fulltext-2016 |-- ... |-- vocabularies This folder contains vocabularies for "Bag-of-Words"-based visual feature extraction. |-- vocabulary_238.csv |-- ... ``` ### Define the path in the ".env" file -Whether you created the folder structure above manually or received a sample ZIP file with some images, you need to map the folder into the Docker container by setting the `SLIDEVIEWERDATA_LOCAL_PATH` and `PARADISE_LOCAL_PATH` variables in your ".env" file. -Simply set them to the path of the folders you created: +Whether you created the folder structure above manually or received a sample ZIP file with some images, +you need to map some data folders into the Docker containers by setting the +`SLIDEVIEWERDATA_LOCAL_PATH` and `PARADISE_LOCAL_PATH` variables in your ".env" file. +Copy or rename the `.env.template` file to `.env` and modify the last 2 variables. +Simply set them to the paths of the folders you created: ``` # Linux example SLIDEVIEWERDATA_LOCAL_PATH=/home/slideviewer/slideviewerdata/ PARADISE_LOCAL_PATH=/home/slideviewer/paradise-data/ # Windows example SLIDEVIEWERDATA_LOCAL_PATH=C:/SlideViewerData/ PARADISE_LOCAL_PATH=C:/ParaDISEData/ ``` Once these variables are defined, you should be able to correctly build and start all the containers. ### Starting the containers To create/start/restart all containers (in background mode), type the following command in the terminal **from the folder containing the docker-compose.yml file**: ``` docker-compose -f docker-compose.yml up -d ``` To stop the containers, type: ``` docker-compose stop ``` To remove all the containers (clean-up), type: ``` docker-compose down ``` ### Accessing the Web Viewer Wait for about 4-5 minutes to allow all services to load correctly, then access your browser at ``http://localhost`` to access the Web Viewer. (**NOTE**: Using Google Chrome is recommended). You can then log in with one of the following default user accounts defined in the ``docker-entrypoint.sh`` file in the ``desuto-couchdb`` directory: * User : ``user``, Password : ``userpass`` (Read-only access) * User : ``pathologist1``, Password : ``pathologistpass`` * User : ``pathologist2``, Password : ``pathologistpass``