WARNING: There is a problem in the fuser implementation which can lead to overflow. Bright pixels may overflow in regions when the sumed value is above 65 536
Vocabulary:
* a source = a single channel image, which can have several slices and frames.
* a cell = a 3D block of pixels (64x64x64 for instance) within a source
== BigStitcher fusion noticed issues ==
It may happen that multi-tile images much bigger than RAM are not fused fast enough with BigStitcher.
There are several reasons to that:
* The dataset are loaded with [Soft References](https://www.baeldung.com/java-soft-references) by default, meaning that, as you stream the data from disk, as many cells as possible as kept in the RAM memory. The RAM starts to be freed when it is almost full. There are options in the VM (`-XX:SoftRefLRUPolicyMSPerMB=2500`) that aim at speeding up memory release, but it's not aggressive enough in practice ( and I'm not sure whether the oldest ones are remove first). This leads to performance issues for big dataset because when the RAM is full, it becomes computationally costly to free memory, slowing down the fusion process, and you may even get some out of memory errors, even though using Soft Refs is supposed to guarantee that this does not happen.
* The fusion process, at least in the default BigStitcher way, iterates over all tiles for each fused pixel. This becomes particularly bad when you have many not overlapping tiles (many tiles in 2D = very bad).
* With Hdf5 format, the reading is not parallel, but really that's a minor issue compared to the two other ones.
Note : there is [BigStitcher spark](https://github.com/PreibischLab/BigStitcher-Spark/blob/main/README.md) which exists and which should be faster, but I did not test it. It's not integrated into Fiji (command line tool), and it works with 16 bits images only.
== BigDataViewer-Playground fusion workflow ==
Here is a workflow that adresses the two main issues: the dataset that needs to be fused is opened with a bounded cache, meaning it will clear memory efficiently before the RAM gets full, and the fusion process occurs by block, where a filtering of the tiles overlapping with the block is first computed before fusing each block.
This workflow was developped to support OME-TIFF (QuPath) export with 8 bits, 16 bits, and rgb pixels, as well as making sure that big 2D planes work as well. Export of 16-bits images to xml-hdf5 works as well.
Enabling the BigDataViewer-Playground update site is compulsory for this workflow. It is recommended to add the demo [tasks update site](https://forum.image.sc/t/demo-and-proposal-new-progress-bars-for-fiji/64956) as well. Its url is `https://biop.epfl.ch/Fiji-Task-Demo/`.
=== Step 1. Open the dataset with bounded number of cells ===
Look for this command:
{F25815026}
Select your xml dataset file:
{F25815040}
The most important parameter is `MaxNumberOfCells`, which is setting the bound over the maximum number of cells kept in memory. It is not obvious to set it correctly because the cell size depends on the way the dataset was saved. A bet I usually do is to count 1 Mb per cell. So putting 1000 should be in the Gb range.
`NumberOfFetcherThreads` can be let to 1 with xml/hdf5 image loader because parallel reading is not enabled.
I'm still unclear what `NumberOfPriorities` does.
After clicking OK, you end up with a window with nodes that can be expanded:
{F25815096}
=== Step 2. Define the fusion model ===
In this step, we define a dummy image (or source) which will serve as the template that is used for fusing the image. What this means is that the dummy image defines a portion of the physical space (3D bounds) with a certain grid size (= voxel size).
Usually, with BigStitcher, you want this template to span all tiles, and by convention the voxel size is 1 in xy, and a certain size in z (maybe the z spacing is bigger that he xy spacing).
To define the model, look for this command:
{F25815189}
And here is an example of settings used:
{F25815193}
Notice that the model appears in the bdv-playground tree view:
{F25815204}
NOTE: the model is actually a black image. The advantage of defining a model as an image is that you could use an image as the fusion template. For instance if you position a mouse atlas correctly in physical space, you could direclty use the atlas image as a fusion template.
=== Step 3. Make the fused source ===
Look for this command:
{F25815315}
Select:
* the sources you need to fuse (only pick a single channel)
* the model
* click `Cache` -> this will enable the fusion by block with pre-filtering of tiles.
* choose a block size. If you go for OME-TIFF export, it's best to select a z block size of 1. Typically 512*512*1 or 1024*1024*1 is a standard reasonable size.
* number of blocks kept in memory: that's the number of cells kept in memory. If you set a 100 and a block sie of 1024*1024, this means you will never exceed 100 MB RAM in the fused image. Put a negative value if you want to kept all blocks in RAM (with soft refs). This number should be higher than the NTthreads.
* NThreads : the number of threads used to compute blocks in parallel. Advised: put the number of cores of your CPU.
* give a name (specifying the channel if you have multiple channels)
* blending mode : usually average if you perform fusion.
{F25815409}
Repeat this procedure for each channel.
In the demo, I have only one channel:
{F25815415}
NOTE: you can investigate now the result of the fusion before running the export. Simply right click on the fuse image(s) and the image and select `BDV - show sources`. This is possible thanks to block computation and on-demand rendering of BigDataViewer.
=== Step 4. Export fuse source to OME-TIFF ===
Look for the following command:
{F25815467}
(You may choose the other one: `Export Sources To OME Tiff`, it will keep pyramidal levels if they pre-exist, but recomputing may be faster).
Fill in:
* The source you want to export (fused sources, several ones if you have multiple channels)
* You can let blank the selected channels, slices and timepoints, but you can also select a subset. For instance putting `0:10:-1` will pick 1 slice every 10 slices. `-1` indicates the index of the last slice.
* I generally use a tile size that matches the block size specified for the fused source.
* Use all cores for the threads.
* Number of tiles computed in advanced : generally a few times the number of cores. It ensures that there is still some jobs enqueued for the fusion.
The compression is highly recommended (you may have plenty of black) and does not create too much overhead.
For instance:
{F25815545}
THEN IT BEGINS!
You can click on the little rectangle mext to Fiji's search bar if you activated the task previe update site (see above).
You end up with this window:
{F25815571}
The first bar is not really a task, it indicates the number of block currently being queued to be written or actively being processed for fusion.
The second bar show the overall progression. It may take some time but will be linear aftersome time.
You can now check that the resources are used correctly. Use the `Memory monitor` of Fiji (look in the search bar). It should have a saw-tooth pattern and should not drift upwards:
{F25815622}
You can set a low RAM amount (like 4 Gb). This should not cause any problem.
You can check whether your CPU is used efficiently, in windows you can do that with the task manager:
{F25815634}
In this picture, we see only 30% is used, which is not optimal but not terrible either. It's linear over time, so the process will end. Tuning in a bit the fetcher threads, the block size and the number of cores used may affect the performance.
=== Step 4. Export fuse source to Xml/Hdf5 ===
Look for this command:
{F25815645}
Here is an example settings:
{F25815652}
Task do not work, but you may have a look at the h5 file being written to see its progression.