gpt2 debuged
percentage of tp, tn, fn and fp does not sum up to 100% ???
performance is in general higher then previously reported, is there a bug?