another small optimization for soft/omp.
why compute r from sqrt(rsq) if we can compare against rsq? also add test against small r to single() method.