Library for measuring performance on the level of:
- events (e.g. epilepsy episodes), not caring about the exact length overlap between true and predicted event, it classifies a match if there is any overlap between predicted and true event.
- duration (or sample-by-sample), classical performance metric that cares about each sample classification
- combination of both (mean and geometric mean of F1 scores of event and duration based metrics)
- number of false positives per day (useful for biomedical applications such as epilepsy monitoring)
It was designed for epilepsy detection monitoring, but can be used for other applications.