Repository with all the component of the Desuto Viewer/Annotation/Retrieval platform.
Add contact information
Change embedded image (Phabricator syntax)
Change embedded image
Add license information
Further adjustments to the README
Update README file
Remove DL features temporarily
Adjust docker-compose files
Make specifying the path to ParaDISE data more flexible using an environment…
Update .env template and improve/fix Dockerfiles of ParaDISE & retrieval system
Improve Dockerfile of retrieval interface
Update docker-compose files
This Docker Compose configuration allows building and running a functional Desuto Web Viewer and Retrieval Interface (including the ParaDISE retrieval engine, see http://paradise.khresmoi.eu for more information) from scratch on any Docker-enabled host.
This README file aims to describe the structure of the containers and any possible configuration changes that may be required when setting up the system on a new host.
To run the platform, you will need to install at least:
- A recent version of Docker CE : https://docs.docker.com/install/linux/docker-ce/ubuntu/
- A recent version of Docker Compose : https://docs.docker.com/compose/install/
- Preferably a machine with a minimum of 10GB of available RAM to store the visual indices in-memory
The docker-compose.yml file is made up of the following containers:
- proxy : Proxy facade for all services based on nginx
- mysql : MySQL server for storing image metadata (URLs, modalities, captions, etc.)
- couchdb : CouchDgiggingver for storing image annotation data from the Web Viewer
- uploads : Basic nginx instance for serving uploaded images
- images : Basic nginx instance for serving images of the datasets used for retrieval
- paradise-gf : Glassfish Java application server instance hosting the ParaDISE engine
- retrieval : Apache instance hosting the Shambala-based retrieval interface
- webviewer : Node.js server for the Web Viewer / Annotation tool
- iipsrv : IIPImage server for generating the image tiles for the Web Viewer
- slideprops : Python Web Service for extracting slide properties (using Openslide)
The following shared volumes are declared:
- upload-volume : Shared volume for uploaded images (served by the uploads container)
The following ports need to be open on the host machine to run all services correctly:
- 80 : Port 80 is used by the proxy facade to expose all underlying services
- COUCHDB_ADMIN_USER : Username of the CouchDB administrator
- COUCHDB_ADMIN_PASS : Password for the CouchDB administrator
- COUCHDB_DB_NAME : Name of the database created for the Web Viewer annotation data
- COUCHDB_PROTOCOL : Protocol used by CouchDB (http by default, https can be managed by the proxy)
- COUCHDB_PORT : Default port to reach the CouchDB container (not exposed, but can be used internally by the other Docker containers)
- COUCHDB_HOST : Hostname of the CouchDB server (public-facing, localhost by default)
- COUCHDB_ROOT_URL : Public-facing URL of the CouchDB database
- COUCHDB_BACKEND_ROOT_URL : Backend URL for the CouchDB database
- PARADISE_ROOT_URL : Default URL of the ParaDISE engine
- RETRIEVAL_ROOT_URL : Default URL of the Retrieval Interface
- IIP_ROOT_URL : Default URL of the IIP server
- VIEWER_ROOT_URL : Default URL of the Web Viewer
- SLIDEPROPS_ROOT_URL : Default URL of the Slide Properties Web Service
- SLIDEVIEWERDATA_LOCAL_PATH : Default path of the data for the Web Viewer (uploaded images, converted images, overlays, etc.)
- PARADISE_LOCAL_PATH: Default path of the data for the ParaDISE retrieval system
Before running the platform, you should set up the folder structure for the Web Viewer data, as well as the retrieval system data.
The Web Viewer data is organised with the following hierarchy:
root |-- converted |-- Files converted automatically by the Web Viewer into the pyramidal, -- Deep-Zoom-compatible TIFF format |-- overlays |-- NAME_OF_UPLOADED_IMAGE.tif |-- feature-name.png |-- uploaded |-- WSI files uploaded via the Web Viewer interface
The structure is fairly simple, only the "overlays" directory is a bit special. This folder must contain a subfolder for each WSI uploaded to the platform. Each of these subfolders may contain one or more PNG files representing an overlay for the WSI of that subfolder.
The Retrieval system data is organised with the following hierarchy:
root |-- all-dmli-pubmed-info.sql This file is an SQL script containing all the info of a figure dataset with captions and modalities, such as the PubMedCentral dataset. The columns are the following: ------------------------------------------------------------- | id | url | thumbnailURL | articleURL | caption | modality | ------------------------------------------------------------- |-- images This folder contains all the patches and images that can be retrieved by the system, organized by dataset and split into several subfolders (by magnification, or other arbitrary levels) Example shown below: |-- Dataset1 (WSI for example) |-- 5X |-- XYZ.tif_idx_0__lvl_3__x2688_y2240.png |-- ... |-- 10X |-- ... |-- Dataset2 (PubMedCentral for example) |-- Subfolder1 |-- Sub-subfolder1 |-- 12-0309-F-3.jpg |-- ... |-- Sub-subfolder2 |-- ... |-- ... |-- paradise-files This folder contains all the files necessary for the ParaDISE backend to function, both for image as well as text retrieval. Details shown below: |-- conf This folder contains configuration files for all the visual indices available in the system, describing the used parameters for indexation, storing mechanism, retrieval settings, etc. Refer to the ParaDISE website (http://paradise.khresmoi.eu/) for more details. |-- pubmedcentral-config.json |-- ... |-- gt This folder contains the ground truth file for the classification algorithm used to automatically compute the image modality of a given image in a dataset. |-- train20XXGT.csv |-- image-lists This folder contains the lists of URLs/paths of images to index. |-- pubmedcentral-dmli.csv |-- ... |-- indices This folder contains CSV files with the features extracted from the images in each dataset |-- wsi-dataset-5x.csv |-- wsi-dataset-10x.csv |-- ... |-- lucene This folder contains the Lucene indices containing the caption and fulltext information for datasets such as PubMedCentral. |-- pubmedcentral-captions-2016-dmli |-- pubmedcentral-fulltext-2016 |-- ... |-- vocabularies This folder contains vocabularies for "Bag-of-Words"-based visual feature extraction. |-- vocabulary_238.csv |-- ...
Whether you created the folder structure above manually or received a sample ZIP file with some images, you need to map some data folders into the Docker containers by setting the SLIDEVIEWERDATA_LOCAL_PATH and PARADISE_LOCAL_PATH variables in your ".env" file. Copy or rename the .env.template file to .env and modify the last 2 variables. Simply set them to the paths of the folders you created:
# Linux example SLIDEVIEWERDATA_LOCAL_PATH=/home/slideviewer/slideviewerdata/ PARADISE_LOCAL_PATH=/home/slideviewer/paradise-data/ # Windows example SLIDEVIEWERDATA_LOCAL_PATH=C:/SlideViewerData/ PARADISE_LOCAL_PATH=C:/ParaDISEData/
Once these variables are defined, you should be able to correctly build and start all the containers.
To create/start/restart all containers (in background mode), type the following command in the terminal from the folder containing the docker-compose.yml file:
docker-compose -f docker-compose.yml up -d
To stop the containers, type:
To remove all the containers (clean-up), type:
Wait for about 4-5 minutes to allow all services to load correctly, then access your browser at http://localhost to access the Web Viewer. (NOTE: Using Google Chrome is recommended).
You can then log in with one of the following default user accounts defined in the docker-entrypoint.sh file in the desuto-couchdb directory:
- User : user, Password : userpass (Read-only access)
- User : pathologist1, Password : pathologistpass
- User : pathologist2, Password : pathologistpass
The current repository contains all necessary parts for getting up and running with the viewer, annotation tool and retrieval system, but lacks some of the required data (images, indices, etc.) for a completely working system.
It also doesn't include the Deep Learning Python services for computing and extracting DL features from images, this is something that will be added in the future.
If you would like to receive some sample data for quickly exploring the features of the platform, don't hesitate to contact the authors for help.
Available under the Apache License 2.0, See LICENSE for more information.
Roger Schaer - email@example.com
Sebastian Otalora - firstname.lastname@example.org
Please contact the authors to ask about adding new contributions to the platform.