- . Optimal parameters for other algorithms:
- pick the best for the middle image (first can be biased) of the test set based on quick visual inspection or our automatic evaluation
- in order to convince that we tried different parameters we can gather the history of tested parameters (with results) or simply state which general directions (with actually tested params) have been tested
It would be nice to have every algorithm batched (CellID, CellProfiler, CellTracer, Tracker). Just give parameters, data source, output path and go for tea.
- . Data sets:
We have 2 sources and 5 different data sets (from a few cells to the entire colony), we can derive more test sets divided into three cases:
- single cells/small colonies (currently TS1 and TS2)
- cell/colonies movement/merging (currently TS3)
- highly clustered cells (currently TS4 and TS5)
Further idea to test on datasets want we can also test how the algorithm react to:
- low temporal resolution (by skipping frames - need to know the initial image sequence interval) this can be done only on full TS1-5 (we would not be able to get ground truth for them). A smart way to use this data needs to be invented.
- out of focus images (by introducing Gaussian blur)
- Evaluation:
Visual inspection: Looking through the results of segmentation/tracking and searching for the cases where the algorithm have problems and makes errors. Special attention to problematic cases:
- clustering of the cells
- early buds detection
- tracking where there is movement
It is also important to visually compare the contour extraction of the algorithm (as for this moment there will be no ground truth for that).
Automatic evaluation based on ground truth (GT): Ground truth creation method Clear instruction how it is to be created:
how small cells we consider what if the cell is in the background when bud is considered a new cell We may just correct the results of one of the algorithms (question: how simple the ground truth creation can then become) How we want to choose the frames for ground truth comparison (assuming that it is very time consuming):
consecutive frames (in case of the movement - tracking test) uniformly spread (mostly for segmentation - because it may be hard to find cells after that time)
Comparison criteria
Segmentation: There is a number of principally similiar measures:
False positive/negative [Bao 2007] Oversegmentation/undersegmentation [Waleby 2004, Zhoul 2006/9] Precision/recall [Sansone 2011] We can use Sansone approach (counting results-GT correspondence) and plot (all are in [0,1]):
cells found/cells in GT precision and recall
How to calculate:
Let R is the set of cells in the results and G is the set of cells in the ground truth.
We find a correspondence matrix Crg between O and G. Let c be the number of G that have corresponding cell O. Then:
precision = c/|O| recall = c/|G| F = 2*precision*recall/(precision+recall)
It is important to always consider both precision and recall because the algorithm can always be trimmed to increase only one of them.
As Sansone we can use LP to find the correspondence (currently it is greedy).
Additionally we can calculate overall precision, recall (in the whole sequence) and F measure [Sansone 2011].
Tracking:
To evaluate tracking we will use precision and recall measures based on the number of “correct trackings” (correct links [Mael 2011]). It is much more invariant to noise than whole trajectory matching [Mael 2011].
How to calculate: Link is two consecutive points in a cell trajectory. Thus a correct link is the case when algorithm finds both points of the link and recognize that they belong to the same cell.
As in the case of segmentation we can plot precision and recall. As well as calculate overall values of precision, recall and F.
Statistics: We can also calculate various statistics and compare the algorithms without the ground truth:
sum of the cells sizes found in GT average cell size