lammps/src/USER-INTELb0ffa46a3bb9lammm-master-deprecated
USER-INTEL
README
LAMMPS Intel(R) Package -------------------------------- W. Michael Brown (Intel) michael.w.brown at intel.com Rodrigo Canales (RWTH Aachen University) Markus H�hnerbach (RWTH Aachen University) Ahmed E. Ismail (RWTH Aachen University) Paolo Bientinesi (RWTH Aachen University) Anupama Kurpad (Intel) Biswajit Mishra (Shell)
This package is based on the USER-OMP package and provides LAMMPS styles that:
- include support for single and mixed precision in addition to double.
- include modifications to support vectorization for key routines
- include modifications to support offload to Intel(R) Xeon Phi(TM) coprocessors
When using the suffix command with "intel", intel styles will be used if they exist. If the suffix command is used with "hybrid intel omp" and the USER-OMP USER-OMP styles will be used whenever USER-INTEL styles are not available. This allow for running most styles in LAMMPS with threading. For example, in the latter case with the USER-OMP package installed,
kspace_style pppm 1e-4
is equivalent to:
kspace_style pppm/omp 1e-4
because no pppm style has been implemented for the Intel package.
In order to use offload to Intel(R) Xeon Phi(TM) coprocessors, the flag -DLMP_INTEL_OFFLOAD should be set in the Makefile. Offload requires the use of Intel compilers.
For portability reasons, vectorization directives are currently only enabled for Intel compilers. Using other compilers may result in significantly lower performance. This behavior can be changed by defining LMP_SIMD_COMPILER for the preprocessor (see intel_preprocess.h).
By default, when running with offload to Intel(R) coprocessors, affinity for host MPI tasks and OpenMP threads is set automatically within the code. This currently requires the use of system calls. To disable at build time, compile with -DINTEL_OFFLOAD_NOAFFINITY.
Vector intrinsics are temporarily being used for the Stillinger-Weber potential to allow for advanced features in the AVX512 instruction set to be exploited on early hardware. We hope to see compiler improvements for AVX512 that will eliminate this requirement, so it is not recommended to develop code based on the intrinsics implementation. Please e-mail the authors for more details.