Phriction Projects Wikis Bioimaging And Optics Platform Computers & Servers at the BIOP Sharing Data History Version 14 vs 15
Version 14 vs 15
Version 14 vs 15
Edits
Edits
- Edit by oburri, Version 15
- Mar 10 2022 12:09
- Edit by romainGuiet, Version 14
- Dec 17 2020 09:21
« Previous Change | Next Change » |
Edit Older Version 14... | Edit Older Version 15... |
Content Changes
Content Changes
(IMPORTANT) BIOP is recommending you to make publicly available your Image Processing and Analysis workflow.
To do so, please make publicly available:
- the dataset (or at least a part of it, eg the images used in one of the figures),
- the script(s)
- a brief description of the **Image Processing and Analysis Workflow** (we'll provide one for the scripts we made for you)
Several platforms exist, please check the //"non-exhaustive//" list below and do not hesitate to [[ https://www.epfl.ch/research/facilities/ptbiop/staff/ | contact us ]].
Please have a look to [[ https://www.epfl.ch/research/open-science/in-practice/sharing-research-data/ | EPFL ressources about Sharing Research Data ]]
= 1.Sharing Data before publication =
| Service/Platform | Data Limit | Limitation |
|---|---|---|
| [[ https://www.epfl.ch/campus/services/ressources-informatiques/stockage-des-documents/ | SWITCH Drive ]] | 100Go | allow confidential documents to be stored (since 01.01.2019|
|[[ https://wiki.epfl.ch/help-gdrive-en | googleDrive ]] | unlimited | no legaly protected document |
= 2.Sharing Data of a Publication =
| Platform | Size Limit per Dataset| Increase the limit | Preview of Files | Example | Conditions in brief |
|---|---|---|---|---|---|
| [[ https://zenodo.org/ | zenodo]]| 50 Go | contact them | Preview of pdf files | [[ https://zenodo.org/record/4058414#.X6PcBkeSmUk | DEVILS dataset ]] | |
| [[ https://figshare.com/ | figshare]] | 20 Go | with fees | ? | [[https://figshare.com/collections/Data_DEVILS_a_tool_for_the_visualization_of_large_datasets_with_a_high_dynamic_range/5197940 | DEVILS dataset ]] | limited |
| [[ https://idr.openmicroscopy.org/ | idr ]]| 1000Go | contact them | images can be explored via Omero| [[ https://idr.openmicroscopy.org/search/?query=Name:idr0061 | Time-lapse ]] from [[https://www.nature.com/articles/s41467-019-10446-z#MOESM4 | Wolf et al. 2019 ]] | Complete datasets - not just images supporting one figure in the publication , preferably CC-BY licence , [[ https://idr.openmicroscopy.org/about/submission.html | see more about submission]]|
== Zenodo ==
{F15524715, size=full}
== IDR ==
{F15524836,size=full}
explore the dataset online
{F15524851,size=full}
(IMPORTANT) BIOP is recommending you to make publicly available your Image Processing and Analysis workflow.
To do so, please make publicly available:
- the dataset (or at least a part of it, eg the images used in one of the figures),
- the script(s)
- a brief description of the **Image Processing and Analysis Workflow** (we'll provide one for the scripts we made for you)
Several platforms exist, please check the //"non-exhaustive//" list below and do not hesitate to [[ https://www.epfl.ch/research/facilities/ptbiop/staff/ | contact us ]].
Please have a look to [[ https://www.epfl.ch/research/open-science/in-practice/sharing-research-data/ | EPFL ressources about Sharing Research Data ]]
= 1.Sharing Data before publication =
| Service/Platform | Data Limit | Limitation |
|---|---|---|
| [[ https://www.epfl.ch/campus/services/ressources-informatiques/stockage-des-documents/ | SWITCH Drive ]] | 100Go | allow confidential documents to be stored (since 01.01.2019|
|[[ https://wiki.epfl.ch/help-gdrive-en | googleDrive ]] | unlimited | no legaly protected document |
= 2.Sharing Data of a Publication =
| Platform | Size Limit per Dataset| Increase the limit | Preview of Files | Example | Conditions in brief |
|---|---|---|---|---|---|
| [[ https://zenodo.org/ | zenodo]]| 50 Go | contact them | Preview of pdf files | [[ https://zenodo.org/record/4058414#.X6PcBkeSmUk | DEVILS dataset ]] | |
| [[ https://figshare.com/ | figshare]] | 20 Go | with fees | ? | [[https://figshare.com/collections/Data_DEVILS_a_tool_for_the_visualization_of_large_datasets_with_a_high_dynamic_range/5197940 | DEVILS dataset ]] | limited |
| [[ https://idr.openmicroscopy.org/ | idr ]]| 1000Go | contact them | images can be explored via Omero| [[ https://idr.openmicroscopy.org/search/?query=Name:idr0061 | Time-lapse ]] from [[https://www.nature.com/articles/s41467-019-10446-z#MOESM4 | Wolf et al. 2019 ]] | Complete datasets - not just images supporting one figure in the publication , preferably CC-BY licence , [[ https://idr.openmicroscopy.org/about/submission.html | see more about submission]]|
== Zenodo ==
{F15524715, size=full}
== IDR ==
= **Upload a dataset to the IDR** =
NOTE: The goal of this page is to concatentate the experience that was had between Oli and IDR when submitting data for the MarrowQuant Publication
== VSI File preparation ==
1. Make sure all VSI names are unique: `Image_01.vsi`, `Image_02.vsi` names will become a problem if you have many projects you want to share and there are clashes in the file names.
2. Make sure that each VSI file has exactly **one ** series. VSI Files can have many series inside them, along with other images (Thumbnails, Labels, Overviews). It is not necessarily the case that all series should be analyzed or shared. **However, if they are together in the same VSI file, one cannot share just a subset**
=== Option 1: Save series separately directly form the scanner ===
{F18700809, size=full}
Use {nav icon=check-square-o, name=New document for each scan}
=== Option 2: Convert your VSI images to OME-tiff ===
WARNING: You will need about 10x the space on disk for this operation.
VSI files are quite compressed, and the ome.tiff conversion using `bioformats2raw` and `raw2ometiff` (as suggested by Bio-Formats and IDR) generate lossless images with less compression.
==== Install the Bio-Formats Converters ====
Instructions from Petr Walczysko, IDR
After having MiniConda installed:
```
conda create --name bioformats-conversion
conda activate conversion
conda install -c ome/label/pre bioformats2raw
conda install -c ome/label/pre raw2ometiff\
pip install blosc
```
Run the follwing script on all projects that need the conversion
{F18702320}
== Checking hashes when sending the files ==
All files should be hash-checked (This ensures that the receiver can double check that the file was not corrupted by the transfer) and you should generate a text file with the hashes.
Here is a script that does this in **PowerShell**
```lang=ps
Get-ChildItem -Path YOURFOLDER -Recurse -Force -File |
Get-FileHash |
Sort-Object -Property 'Path' |
Export-Csv -Path "file-hashes.csv" -NoTypeInformation
```
== Output data ==
If there is exactly one row per image, this can be sent as a csv table
In case there are more than 1 rows per image, which is typical if there are statistics on multiple regions on the image, these must be attached as separate csv files to each image and referenced in a table containing, again, exactly one row per image
=== Example: ===
Images contain multiple result rows:
1. Provide a table where each row corresponds exactly to one image which contains a column with a reference to the csv file that contains the multirow results
|Image_Name|Metadata_1| Metadata_2| .... | Results_File |
|---|---|---|---|---|
| Experiment1.ome.tiff| Control| Human| ... | Experiment1.csv|
| Experiment4.ome.tiff| Control| Human| ... | Experiment4.csv|
| Treated1.ome.tiff| Treated| Human| ... | Treated1.csv|
Then each csv file will contain the results for that one image which is to be sent separately
= **Download a dataset from the IDR** =
The raw data of all studies published in IDR can be downloaded using the Aspera desktop client.
Each published study has an associated passwordless username matching the IDR accession number e.g. idr0001.
== Option 1: Using Fluorescent Platypus ==
On Fluorescent Platypus, log-in with the biopstaff account.
Open the IBM Aspera Desktop Client
{F23481211}
Hit the {nav Connections} button to open the connection manager. Hit the {nav +} button to add a new connection and enter the following:
- **Host**: fasp.ebi.ac.uk
- **User**: idrNNNN where NNNN is the IDR study identifier, e.g. idr0047
- **Authentication**: Public Key
{F23481396, size=full}
Go to {nav Manage Keys > Import a key from the filesystem} and select {nav asperaweb_id_dsa.openssh} (this only needs to be done if you haven’t previously imported this key)
Select {nav asperaweb_id_dsa.openssh} in the list of keys.
Click on {nav Test Connection}
Hit the {nav Ok} button
You should now be able to connect to the Aspera server and see the raw data for your chosen IDR study.
Choose the destination location in the left panel. In the right panel, choose the files to download.
Click on the right-to-left {icon arrow-left} arrow to launch the download.
{F23481431, size=full}
Observe the progression in the bottom pane.
== Option 2: Install the Aspera client on another computer ==
Download and install the **IBM Aspera Desktop Client** from the **Aspera client-deployed software** section of the [[ https://www.ibm.com/products/aspera/downloads | Aspera downloads site ]].
In the list of versions, choose the latest version from the **3.9 series** of the Desktop client. There is currently a connection issue with all the **4.x series**.
{F23481443, size=full}
You have to connect to an IBM account (or create one) to download the client.
Download the Aspera public key {nav asperaweb_id_dsa.openssh}
{F23480992}
Install the IBM Aspera Desktop Client and follow the steps from "Option 1" above.
{F15524836,size=full}
explore the dataset online
{F15524851,size=full}
(IMPORTANT) BIOP is recommending you to make publicly available your Image Processing and Analysis workflow.
To do so, please make publicly available:
- the dataset (or at least a part of it, eg the images used in one of the figures),
- the script(s)
- a brief description of the **Image Processing and Analysis Workflow** (we'll provide one for the scripts we made for you)
Several platforms exist, please check the //"non-exhaustive//" list below and do not hesitate to [[ https://www.epfl.ch/research/facilities/ptbiop/staff/ | contact us ]].
Please have a look to [[ https://www.epfl.ch/research/open-science/in-practice/sharing-research-data/ | EPFL ressources about Sharing Research Data ]]
= 1.Sharing Data before publication =
| Service/Platform | Data Limit | Limitation |
|---|---|---|
| [[ https://www.epfl.ch/campus/services/ressources-informatiques/stockage-des-documents/ | SWITCH Drive ]] | 100Go | allow confidential documents to be stored (since 01.01.2019|
|[[ https://wiki.epfl.ch/help-gdrive-en | googleDrive ]] | unlimited | no legaly protected document |
= 2.Sharing Data of a Publication =
| Platform | Size Limit per Dataset| Increase the limit | Preview of Files | Example | Conditions in brief |
|---|---|---|---|---|---|
| [[ https://zenodo.org/ | zenodo]]| 50 Go | contact them | Preview of pdf files | [[ https://zenodo.org/record/4058414#.X6PcBkeSmUk | DEVILS dataset ]] | |
| [[ https://figshare.com/ | figshare]] | 20 Go | with fees | ? | [[https://figshare.com/collections/Data_DEVILS_a_tool_for_the_visualization_of_large_datasets_with_a_high_dynamic_range/5197940 | DEVILS dataset ]] | limited |
| [[ https://idr.openmicroscopy.org/ | idr ]]| 1000Go | contact them | images can be explored via Omero| [[ https://idr.openmicroscopy.org/search/?query=Name:idr0061 | Time-lapse ]] from [[https://www.nature.com/articles/s41467-019-10446-z#MOESM4 | Wolf et al. 2019 ]] | Complete datasets - not just images supporting one figure in the publication , preferably CC-BY licence , [[ https://idr.openmicroscopy.org/about/submission.html | see more about submission]]|
== Zenodo ==
{F15524715, size=full}
== IDR ==
= **Upload a dataset to the IDR** =
NOTE: The goal of this page is to concatentate the experience that was had between Oli and IDR when submitting data for the MarrowQuant Publication
== VSI File preparation ==
1. Make sure all VSI names are unique: `Image_01.vsi`, `Image_02.vsi` names will become a problem if you have many projects you want to share and there are clashes in the file names.
2. Make sure that each VSI file has exactly **one ** series. VSI Files can have many series inside them, along with other images (Thumbnails, Labels, Overviews). It is not necessarily the case that all series should be analyzed or shared. **However, if they are together in the same VSI file, one cannot share just a subset**
=== Option 1: Save series separately directly form the scanner ===
{F18700809, size=full}
Use {nav icon=check-square-o, name=New document for each scan}
=== Option 2: Convert your VSI images to OME-tiff ===
WARNING: You will need about 10x the space on disk for this operation.
VSI files are quite compressed, and the ome.tiff conversion using `bioformats2raw` and `raw2ometiff` (as suggested by Bio-Formats and IDR) generate lossless images with less compression.
==== Install the Bio-Formats Converters ====
Instructions from Petr Walczysko, IDR
After having MiniConda installed:
```
conda create --name bioformats-conversion
conda activate conversion
conda install -c ome/label/pre bioformats2raw
conda install -c ome/label/pre raw2ometiff\
pip install blosc
```
Run the follwing script on all projects that need the conversion
{F18702320}
== Checking hashes when sending the files ==
All files should be hash-checked (This ensures that the receiver can double check that the file was not corrupted by the transfer) and you should generate a text file with the hashes.
Here is a script that does this in **PowerShell**
```lang=ps
Get-ChildItem -Path YOURFOLDER -Recurse -Force -File |
Get-FileHash |
Sort-Object -Property 'Path' |
Export-Csv -Path "file-hashes.csv" -NoTypeInformation
```
== Output data ==
If there is exactly one row per image, this can be sent as a csv table
In case there are more than 1 rows per image, which is typical if there are statistics on multiple regions on the image, these must be attached as separate csv files to each image and referenced in a table containing, again, exactly one row per image
=== Example: ===
Images contain multiple result rows:
1. Provide a table where each row corresponds exactly to one image which contains a column with a reference to the csv file that contains the multirow results
|Image_Name|Metadata_1| Metadata_2| .... | Results_File |
|---|---|---|---|---|
| Experiment1.ome.tiff| Control| Human| ... | Experiment1.csv|
| Experiment4.ome.tiff| Control| Human| ... | Experiment4.csv|
| Treated1.ome.tiff| Treated| Human| ... | Treated1.csv|
Then each csv file will contain the results for that one image which is to be sent separately
= **Download a dataset from the IDR** =
The raw data of all studies published in IDR can be downloaded using the Aspera desktop client.
Each published study has an associated passwordless username matching the IDR accession number e.g. idr0001.
== Option 1: Using Fluorescent Platypus ==
On Fluorescent Platypus, log-in with the biopstaff account.
Open the IBM Aspera Desktop Client
{F23481211}
Hit the {nav Connections} button to open the connection manager. Hit the {nav +} button to add a new connection and enter the following:
- **Host**: fasp.ebi.ac.uk
- **User**: idrNNNN where NNNN is the IDR study identifier, e.g. idr0047
- **Authentication**: Public Key
{F23481396, size=full}
Go to {nav Manage Keys > Import a key from the filesystem} and select {nav asperaweb_id_dsa.openssh} (this only needs to be done if you haven’t previously imported this key)
Select {nav asperaweb_id_dsa.openssh} in the list of keys.
Click on {nav Test Connection}
Hit the {nav Ok} button
You should now be able to connect to the Aspera server and see the raw data for your chosen IDR study.
Choose the destination location in the left panel. In the right panel, choose the files to download.
Click on the right-to-left {icon arrow-left} arrow to launch the download.
{F23481431, size=full}
Observe the progression in the bottom pane.
== Option 2: Install the Aspera client on another computer ==
Download and install the **IBM Aspera Desktop Client** from the **Aspera client-deployed software** section of the [[ https://www.ibm.com/products/aspera/downloads | Aspera downloads site ]].
In the list of versions, choose the latest version from the **3.9 series** of the Desktop client. There is currently a connection issue with all the **4.x series**.
{F23481443, size=full}
You have to connect to an IBM account (or create one) to download the client.
Download the Aspera public key {nav asperaweb_id_dsa.openssh}
{F23480992}
Install the IBM Aspera Desktop Client and follow the steps from "Option 1" above.
{F15524836,size=full}
explore the dataset online
{F15524851,size=full}
c4science · Help