lammps/bencha2b4a794d9a2lammps-icms
bench
README
LAMMPS benchmark problems
This directory contains 5 benchmark problems which are discussed in the Benchmark section of the LAMMPS documentation, and on the Benchmark page of the LAMMPS WWW site (lammps.sandia.gov/bench).
This directory also has several sub-directories:
FERMI benchmark scripts for desktop machine with Fermi GPUs (Tesla) KEPLER benchmark scripts for GPU cluster with Kepler GPUs POTENTIALS benchmarks scripts for various potentials in LAMMPS
The results for all of these benchmarks are displayed and discussed on the Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
The remainder of this file refers to the 5 problems in the top-level of this directory and how to run them on CPUs, either in serial or parallel. The sub-directories have their own README files which you should refer to before running those scripts.
Each of the 5 problems has 32,000 atoms and runs for 100 timesteps. Each can be run as a serial benchmark (on one processor) or in parallel. In parallel, each benchmark can be run as a fixed-size or scaled-size problem. For fixed-size benchmarking, the same 32K atom problem is run on various numbers of processors. For scaled-size benchmarking, the model size is increased with the number of processors. E.g. on 8 processors, a 256K-atom problem is run; on 1024 processors, a 32-million atom problem is run, etc.
A few sample log file outputs on different machines and different numbers of processors are included in this directory to compare your answers to. E.g. a log file like log.date.chain.lmp.scaled.foo.P is for a scaled-size version of the Chain benchmark, run on P processors of machine "foo" with the dated version of LAMMPS. Note that the Eam and Lj benchmarks may not give identical answers on different machines because of the "velocity loop geom" option that assigns velocities based on atom coordinates - see the discussion in the documentation for the velocity command for details.
The CPU time (in seconds) for the run is in the "Loop time" line of the log files, e.g.
Loop time of 3.89418 on 8 procs for 100 steps with 32000 atoms
Timing results for these problems run on various machines are listed on the Benchmarks page of the LAMMPS WWW Site.
These are the 5 benchmark problems:
LJ = atomic fluid, Lennard-Jones potential with 2.5 sigma cutoff (55 neighbors per atom), NVE integration
Chain = bead-spring polymer melt of 100-mer chains, FENE bonds and LJ pairwise interactions with a 2^(1/6) sigma cutoff (5 neighbors per atom), NVE integration
EAM = metallic solid, Cu EAM potential with 4.95 Angstrom cutoff (45 neighbors per atom), NVE integration
Chute = granular chute flow, frictional history potential with 1.1 sigma cutoff (7 neighbors per atom), NVE integration
Rhodo = rhodopsin protein in solvated lipid bilayer, CHARMM force field with a 10 Angstrom LJ cutoff (440 neighbors per atom), particle-particle particle-mesh (PPPM) for long-range Coulombics, NPT integration
Here is a src/Make.py command which will perform a parallel build of a LAMMPS executable "lmp_mpi" with all the packages needed by all the examples. This assumes you have an MPI installed on your machine so that "mpicxx" can be used as the wrapper compiler. It also assumes you have an Intel compiler to use as the base compiler. You can leave off the "-cc mpi wrap=icc" switch if that is not the case. You can also leave off the "-fft fftw3" switch if you do not have the FFTW (v3) installed as an FFT package, in which case the default KISS FFT library will be used.
cd src Make.py -j 16 -p none molecule manybody kspace granular rigid orig \
-cc mpi wrap=icc -fft fftw3 -a file mpi
Here is how to run each problem, assuming the LAMMPS executable is named lmp_mpi, and you are using the mpirun command to launch parallel runs:
Serial (one processor runs):
lmp_mpi < in.lj lmp_mpi < in.chain lmp_mpi < in.eam lmp_mpi < in.chute lmp_mpi < in.rhodo
Parallel fixed-size runs (on 8 procs in this case):
mpirun -np 8 lmp_mpi < in.lj mpirun -np 8 lmp_mpi < in.chain mpirun -np 8 lmp_mpi < in.eam mpirun -np 8 lmp_mpi < in.chute mpirun -np 8 lmp_mpi < in.rhodo
Parallel scaled-size runs (on 16 procs in this case):
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
For each of the scaled-size runs you must set 3 variables as -var command line switches. The variables x,y,z are used in the input scripts to scale up the problem size in each dimension. Imagine the P processors arrayed as a 3d grid, so that P = Px * Py * Pz. For P = 16, you might use Px = 2, Py = 2, Pz = 4. To scale up equally in all dimensions you roughly want Px = Py = Pz. Using the var switches, set x = Px, y = Py, and z = Pz.
For Chute runs, you must have Pz = 1. Therefore P = Px * Py and you only need to set variables x and y.