Section_perf.html
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Tue, Nov 5, 22:27

Section_perf.html
View Options

	<HTML>
	<CENTER><A HREF = "Section_example.html">Previous Section</A> - <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A> - <A HREF = "Section_tools.html">Next Section</A>
	</CENTER>






	<HR>

	<H3>8. Performance & scalability
	</H3>
	<P>LAMMPS performance on several prototypical benchmarks and machines is
	discussed on the Benchmarks page of the <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> where
	CPU timings and parallel efficiencies are listed. Here, the
	benchmarks are described briefly and some useful rules of thumb about
	their performance are highlighted.
	</P>
	<P>These are the 5 benchmark problems:
	</P>
	<OL><LI>LJ = atomic fluid, Lennard-Jones potential with 2.5 sigma cutoff (55
	neighbors per atom), NVE integration

	<LI>Chain = bead-spring polymer melt of 100-mer chains, FENE bonds and LJ
	pairwise interactions with a 2^(1/6) sigma cutoff (5 neighbors per
	atom), NVE integration

	<LI>EAM = metallic solid, Cu EAM potential with 4.95 Angstrom cutoff (45
	neighbors per atom), NVE integration

	<LI>Chute = granular chute flow, frictional history potential with 1.1
	sigma cutoff (7 neighbors per atom), NVE integration

	<LI>Rhodo = rhodopsin protein in solvated lipid bilayer, CHARMM force
	field with a 10 Angstrom LJ cutoff (440 neighbors per atom),
	particle-particle particle-mesh (PPPM) for long-range Coulombics, NPT
	integration
	</OL>
	<P>The input files for running the benchmarks are included in the LAMMPS
	distribution, as are sample output files. Each of the 5 problems has
	32,000 atoms and runs for 100 timesteps. Each can be run as a serial
	benchmarks (on one processor) or in parallel. In parallel, each
	benchmark can be run as a fixed-size or scaled-size problem. For
	fixed-size benchmarking, the same 32K atom problem is run on various
	numbers of processors. For scaled-size benchmarking, the model size
	is increased with the number of processors. E.g. on 8 processors, a
	256K-atom problem is run; on 1024 processors, a 32-million atom
	problem is run, etc.
	</P>
	<P>A useful metric from the benchmarks is the CPU cost per atom per
	timestep. Since LAMMPS performance scales roughly linearly with
	problem size and timesteps, the run time of any problem using the same
	model (atom style, force field, cutoff, etc) can then be estimated.
	For example, on a 1.7 GHz Pentium desktop machine (Intel icc compiler
	under Red Hat Linux), the CPU run-time in seconds/atom/timestep for
	the 5 problems is
	</P>
	<DIV ALIGN=center><TABLE BORDER=1 >
	<TR ALIGN="center"><TD ALIGN ="right">Problem:</TD><TD > LJ</TD><TD > Chain</TD><TD > EAM</TD><TD > Chute</TD><TD > Rhodopsin</TD></TR>
	<TR ALIGN="center"><TD ALIGN ="right">CPU/atom/step:</TD><TD > 4.55E-6</TD><TD > 2.18E-6</TD><TD > 9.38E-6</TD><TD > 2.18E-6</TD><TD > 1.11E-4</TD></TR>
	<TR ALIGN="center"><TD ALIGN ="right">Ratio to LJ:</TD><TD > 1.0</TD><TD > 0.48</TD><TD > 2.06</TD><TD > 0.48</TD><TD > 24.5
	</TD></TR></TABLE></DIV>

	<P>The ratios mean that if the atomic LJ system has a normalized cost of
	1.0, the bead-spring chains and granular systems run 2x faster, while
	the EAM metal and solvated protein models run 2x and 25x slower
	respectively. The bulk of these cost differences is due to the
	expense of computing a particular pairwise force field for a given
	number of neighbors per atom.
	</P>
	<P>Performance on a parallel machine can also be predicted from the
	one-processor timings if the parallel efficiency can be estimated.
	The communication bandwidth and latency of a particular parallel
	machine affects the efficiency. On most machines LAMMPS will give
	fixed-size parallel efficiencies on these benchmarks above 50% so long
	as the atoms/processor count is a few 100 or greater - i.e. on 64 to
	128 processors. Likewise, scaled-size parallel efficiencies will
	typically be 80% or greater up to very large processor counts. The
	benchmark data on the <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> gives specific examples on
	some different machines, including a run of 3/4 of a billion LJ atoms
	on 1500 processors that ran at 85% parallel efficiency.
	</P>
	</HTML>

Section_perf.htmlNo OneTemporaryActions

File Metadata

Section_perf.htmlView Options

Event Timeline

Section_perf.html
No OneTemporary
Actions

Section_perf.html
View Options