add multi-threaded stillinger-weber pair style
mildly tested and seems to be working fine.
it contains several optimizations.
- twobody and threebody functions are moved to the header to become inlined.
- templated functions eliminate if statements with per-call constant value at compile time.