lammps/bench/FERMI705237ae9eeaefficient_neuronet
FERMI
README
These are input scripts used to run versions of several of the benchmarks in the top-level bench directory using the GPU and USER-CUDA accelerator packages. The results of running these scripts on two different machines (a desktop with 2 Tesla GPUs and the ORNL Titan supercomputer) are shown on the "GPU (Fermi)" section of the Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
Examples are shown below of how to run these scripts. This assumes you have built 3 executables with both the GPU and USER-CUDA packages installed, e.g.
lmp_linux_single lmp_linux_mixed lmp_linux_double
The precision (single, mixed, double) refers to the GPU and USER-CUDA pacakge precision. See the README files in the lib/gpu and lib/cuda directories for instructions on how to build the packages with different precisions. The GPU and USER-CUDA sub-sections of the doc/Section_accelerate.html file also describes this process.
To run on just CPUs (without using the GPU or USER-CUDA styles), do something like the following:
mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj mpirun -np 12 lmp_linux_double -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
The "xyz" settings determine the problem size. The "t" setting determines the number of timesteps.
These mpirun commands run on a single node. To run on multiple nodes, scale up the "-np" setting.
To run with the GPU package, do something like the following:
mpirun -np 12 lmp_linux_single -sf gpu -pk gpu 1 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj mpirun -np 8 lmp_linux_mixed -sf gpu -pk gpu 2 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj
The "xyz" settings determine the problem size. The "t" setting determines the number of timesteps. The "np" setting determines how many MPI tasks (per node) the problem will run on, The numeric argument to the "-pk" setting is the number of GPUs (per node). Note that you can use more MPI tasks than GPUs (per node) with the GPU package.
These mpirun commands run on a single node. To run on multiple nodes, scale up the "-np" setting, and control the number of MPI tasks per node via a "-ppn" setting.
To run with the USER-CUDA package, do something like the following:
If the script has "cuda" in its name, it is meant to be run using the USER-CUDA package. For example:
mpirun -np 1 ../lmp_linux_single -c on -sf cuda -v g 1 -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cuda
mpirun -np 2 ../lmp_linux_double -c on -sf cuda -v g 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam.cuda
The "xyz" settings determine the problem size. The "t" setting determines the number of timesteps. The "np" setting determines how many MPI tasks per compute node the problem will run on, and the "g" setting determines how many GPUs per compute node the problem will run on, i.e. 1 or 2 in this case. For the USER-CUDA package, the number of MPI tasks and GPUs (both per compute node) must be equal.
These mpirun commands run on a single node. To run on multiple nodes, scale up the "-np" setting.
If the script has "titan" in its name, it was run on the Titan supercomputer at ORNL.