Due to a mistake the original dataset was lost.
The presented data has the same split of names used for the test/train sets.
But the random modifications are different, as the entries have been generated again.
But this shouldn't modify any of the performance, and this new set can be used
without any restriction to reproduce the claimed performance.