accelerate_cuda.html
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Sun, Jul 7, 19:47

accelerate_cuda.html
View Options



	<!DOCTYPE html>
	<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
	<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
	<head>
	<meta charset="utf-8">

	<meta name="viewport" content="width=device-width, initial-scale=1.0">

	<title>5.USER-CUDA package — LAMMPS documentation</title>















	<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />



	<link rel="stylesheet" href="_static/sphinxcontrib-images/LightBox2/lightbox2/css/lightbox.css" type="text/css" />



	<link rel="top" title="LAMMPS documentation" href="index.html"/>


	<script src="_static/js/modernizr.min.js"></script>

	</head>

	<body class="wy-body-for-nav" role="document">

	<div class="wy-grid-for-nav">


	<nav data-toggle="wy-nav-shift" class="wy-nav-side">
	<div class="wy-side-nav-search">



	<a href="Manual.html" class="icon icon-home"> LAMMPS



	</a>


	<div role="search">
	<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
	<input type="text" name="q" placeholder="Search docs" />
	<input type="hidden" name="check_keywords" value="yes" />
	<input type="hidden" name="area" value="default" />
	</form>
	</div>


	</div>

	<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">



	<ul>
	<li class="toctree-l1"><a class="reference internal" href="Section_intro.html">1. Introduction</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_start.html">2. Getting Started</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_commands.html">3. Commands</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_packages.html">4. Packages</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_accelerate.html">5. Accelerating LAMMPS performance</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_howto.html">6. How-to discussions</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_example.html">7. Example problems</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_perf.html">8. Performance & scalability</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_tools.html">9. Additional tools</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_modify.html">10. Modifying & extending LAMMPS</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_python.html">11. Python interface to LAMMPS</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_errors.html">12. Errors</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_history.html">13. Future and history</a></li>
	</ul>



	</div>

	</nav>

	<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">


	<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
	<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
	<a href="Manual.html">LAMMPS</a>
	</nav>



	<div class="wy-nav-content">
	<div class="rst-content">
	<div role="navigation" aria-label="breadcrumbs navigation">
	<ul class="wy-breadcrumbs">
	<li><a href="Manual.html">Docs</a> »</li>

	<li>5.USER-CUDA package</li>
	<li class="wy-breadcrumbs-aside">


	<a href="http://lammps.sandia.gov">Website</a>
	<a href="Section_commands.html#comm">Commands</a>

	</li>
	</ul>
	<hr/>

	</div>
	<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
	<div itemprop="articleBody">

	<p><a class="reference internal" href="Section_accelerate.html"><span class="doc">Return to Section accelerate overview</span></a></p>
	<div class="section" id="user-cuda-package">
	<h1>5.USER-CUDA package</h1>
	<p>The USER-CUDA package was developed by Christian Trott (Sandia) while
	at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions
	of many pair styles, many fixes, a few computes, and for long-range
	Coulombics via the PPPM command. It has the following general
	features:</p>
	<ul class="simple">
	<li>The package is designed to allow an entire LAMMPS calculation, for
	many timesteps, to run entirely on the GPU (except for inter-processor
	MPI communication), so that atom-based data (e.g. coordinates, forces)
	do not have to move back-and-forth between the CPU and GPU.</li>
	<li>The speed-up advantage of this approach is typically better when the
	number of atoms per GPU is large</li>
	<li>Data will stay on the GPU until a timestep where a non-USER-CUDA fix
	or compute is invoked. Whenever a non-GPU operation occurs (fix,
	compute, output), data automatically moves back to the CPU as needed.
	This may incur a performance penalty, but should otherwise work
	transparently.</li>
	<li>Neighbor lists are constructed on the GPU.</li>
	<li>The package only supports use of a single MPI task, running on a
	single CPU (core), assigned to each GPU.</li>
	</ul>
	<p>Here is a quick overview of how to use the USER-CUDA package:</p>
	<ul class="simple">
	<li>build the library in lib/cuda for your GPU hardware with desired precision</li>
	<li>include the USER-CUDA package and build LAMMPS</li>
	<li>use the mpirun command to specify 1 MPI task per GPU (on each node)</li>
	<li>enable the USER-CUDA package via the “-c on” command-line switch</li>
	<li>specify the # of GPUs per node</li>
	<li>use USER-CUDA styles in your input script</li>
	</ul>
	<p>The latter two steps can be done using the “-pk cuda” and “-sf cuda”
	<a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switches</span></a> respectively. Or
	the effect of the “-pk” or “-sf” switches can be duplicated by adding
	the <a class="reference internal" href="package.html"><span class="doc">package cuda</span></a> or <a class="reference internal" href="suffix.html"><span class="doc">suffix cuda</span></a> commands
	respectively to your input script.</p>
	<p><strong>Required hardware/software:</strong></p>
	<p>To use this package, you need to have one or more NVIDIA GPUs and
	install the NVIDIA Cuda software on your system:</p>
	<p>Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
	help you to find out the Compute Capability of your card:</p>
	<p><a class="reference external" href="http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units">http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units</a></p>
	<p>Install the Nvidia Cuda Toolkit (version 3.2 or higher) and the
	corresponding GPU drivers. The Nvidia Cuda SDK is not required, but
	we recommend it also be installed. You can then make sure its sample
	projects can be compiled without problems.</p>
	<p><strong>Building LAMMPS with the USER-CUDA package:</strong></p>
	<p>This requires two steps (a,b): build the USER-CUDA library, then build
	LAMMPS with the USER-CUDA package.</p>
	<p>You can do both these steps in one line, using the src/Make.py script,
	described in <a class="reference internal" href="Section_start.html#start-4"><span class="std std-ref">Section 2.4</span></a> of the manual.
	Type “Make.py -h” for help. If run from the src directory, this
	command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
	starting Makefile.machine:</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Make</span><span class="o">.</span><span class="n">py</span> <span class="o">-</span><span class="n">p</span> <span class="n">cuda</span> <span class="o">-</span><span class="n">cuda</span> <span class="n">mode</span><span class="o">=</span><span class="n">single</span> <span class="n">arch</span><span class="o">=</span><span class="mi">20</span> <span class="o">-</span><span class="n">o</span> <span class="n">cuda</span> <span class="o">-</span><span class="n">a</span> <span class="n">lib</span><span class="o">-</span><span class="n">cuda</span> <span class="n">file</span> <span class="n">mpi</span>
	</pre></div>
	</div>
	<p>Or you can follow these two (a,b) steps:</p>
	<ol class="loweralpha simple">
	<li>Build the USER-CUDA library</li>
	</ol>
	<p>The USER-CUDA library is in lammps/lib/cuda. If your <em>CUDA</em> toolkit
	is not installed in the default system directoy <em>/usr/local/cuda</em> edit
	the file <em>lib/cuda/Makefile.common</em> accordingly.</p>
	<p>To build the library with the settings in lib/cuda/Makefile.default,
	simply type:</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">make</span>
	</pre></div>
	</div>
	<p>To set options when the library is built, type “make OPTIONS”, where
	<em>OPTIONS</em> are one or more of the following. The settings will be
	written to the <em>lib/cuda/Makefile.defaults</em> before the build.</p>
	<pre class="literal-block">
	<em>precision=N</em> to set the precision level
	N = 1 for single precision (default)
	N = 2 for double precision
	N = 3 for positions in double precision
	N = 4 for positions and velocities in double precision
	<em>arch=M</em> to set GPU compute capability
	M = 35 for Kepler GPUs
	M = 20 for CC2.0 (GF100/110, e.g. C2050,GTX580,GTX470) (default)
	M = 21 for CC2.1 (GF104/114, e.g. GTX560, GTX460, GTX450)
	M = 13 for CC1.3 (GF200, e.g. C1060, GTX285)
	<em>prec_timer=0/1</em> to use hi-precision timers
	0 = do not use them (default)
	1 = use them
	this is usually only useful for Mac machines
	<em>dbg=0/1</em> to activate debug mode
	0 = no debug mode (default)
	1 = yes debug mode
	this is only useful for developers
	<em>cufft=1</em> for use of the CUDA FFT library
	0 = no CUFFT support (default)
	in the future other CUDA-enabled FFT libraries might be supported
	</pre>
	<p>If the build is successful, it will produce the files liblammpscuda.a and
	Makefile.lammps.</p>
	<p>Note that if you change any of the options (like precision), you need
	to re-build the entire library. Do a “make clean” first, followed by
	“make”.</p>
	<ol class="loweralpha simple" start="2">
	<li>Build LAMMPS with the USER-CUDA package</li>
	</ol>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">cd</span> <span class="n">lammps</span><span class="o">/</span><span class="n">src</span>
	<span class="n">make</span> <span class="n">yes</span><span class="o">-</span><span class="n">user</span><span class="o">-</span><span class="n">cuda</span>
	<span class="n">make</span> <span class="n">machine</span>
	</pre></div>
	</div>
	<p>No additional compile/link flags are needed in Makefile.machine.</p>
	<p>Note that if you change the USER-CUDA library precision (discussed
	above) and rebuild the USER-CUDA library, then you also need to
	re-install the USER-CUDA package and re-build LAMMPS, so that all
	affected files are re-compiled and linked to the new USER-CUDA
	library.</p>
	<p><strong>Run with the USER-CUDA package from the command line:</strong></p>
	<p>The mpirun or mpiexec command sets the total number of MPI tasks used
	by LAMMPS (one or multiple per compute node) and the number of MPI
	tasks used per node. E.g. the mpirun command in MPICH does this via
	its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.</p>
	<p>When using the USER-CUDA package, you must use exactly one MPI task
	per physical GPU.</p>
	<p>You must use the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> to enable the USER-CUDA package.
	The “-c on” switch also issues a default <a class="reference internal" href="package.html"><span class="doc">package cuda 1</span></a>
	command which sets various USER-CUDA options to default values, as
	discussed on the <a class="reference internal" href="package.html"><span class="doc">package</span></a> command doc page.</p>
	<p>Use the “-sf cuda” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a>,
	which will automatically append “cuda” to styles that support it. Use
	the “-pk cuda Ng” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> to
	set Ng = # of GPUs per node to a different value than the default set
	by the “-c on” switch (1 GPU) or change other <a class="reference internal" href="package.html"><span class="doc">package cuda</span></a> options.</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">lmp_machine</span> <span class="o">-</span><span class="n">c</span> <span class="n">on</span> <span class="o">-</span><span class="n">sf</span> <span class="n">cuda</span> <span class="o">-</span><span class="n">pk</span> <span class="n">cuda</span> <span class="mi">1</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 1 MPI task uses 1 GPU</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">2</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">c</span> <span class="n">on</span> <span class="o">-</span><span class="n">sf</span> <span class="n">cuda</span> <span class="o">-</span><span class="n">pk</span> <span class="n">cuda</span> <span class="mi">2</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 2 MPI tasks use 2 GPUs on a single 16-core (or whatever) node</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">24</span> <span class="o">-</span><span class="n">ppn</span> <span class="mi">2</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">c</span> <span class="n">on</span> <span class="o">-</span><span class="n">sf</span> <span class="n">cuda</span> <span class="o">-</span><span class="n">pk</span> <span class="n">cuda</span> <span class="mi">2</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># ditto on 12 16-core nodes</span>
	</pre></div>
	</div>
	<p>The syntax for the “-pk” switch is the same as same as the “package
	cuda” command. See the <a class="reference internal" href="package.html"><span class="doc">package</span></a> command doc page for
	details, including the default values used for all its options if it
	is not specified.</p>
	<p>Note that the default for the <a class="reference internal" href="package.html"><span class="doc">package cuda</span></a> command is
	to set the Newton flag to “off” for both pairwise and bonded
	interactions. This typically gives fastest performance. If the
	<a class="reference internal" href="newton.html"><span class="doc">newton</span></a> command is used in the input script, it can
	override these defaults.</p>
	<p><strong>Or run with the USER-CUDA package by editing an input script:</strong></p>
	<p>The discussion above for the mpirun/mpiexec command and the requirement
	of one MPI task per GPU is the same.</p>
	<p>You must still use the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> to enable the USER-CUDA package.</p>
	<p>Use the <a class="reference internal" href="suffix.html"><span class="doc">suffix cuda</span></a> command, or you can explicitly add a
	“cuda” suffix to individual styles in your input script, e.g.</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pair_style</span> <span class="n">lj</span><span class="o">/</span><span class="n">cut</span><span class="o">/</span><span class="n">cuda</span> <span class="mf">2.5</span>
	</pre></div>
	</div>
	<p>You only need to use the <a class="reference internal" href="package.html"><span class="doc">package cuda</span></a> command if you
	wish to change any of its option defaults, including the number of
	GPUs/node (default = 1), as set by the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a>.</p>
	<p><strong>Speed-ups to expect:</strong></p>
	<p>The performance of a GPU versus a multi-core CPU is a function of your
	hardware, which pair style is used, the number of atoms/GPU, and the
	precision used on the GPU (double, single, mixed).</p>
	<p>See the <a class="reference external" href="http://lammps.sandia.gov/bench.html">Benchmark page</a> of the
	LAMMPS web site for performance of the USER-CUDA package on different
	hardware.</p>
	<p><strong>Guidelines for best performance:</strong></p>
	<ul class="simple">
	<li>The USER-CUDA package offers more speed-up relative to CPU performance
	when the number of atoms per GPU is large, e.g. on the order of tens
	or hundreds of 1000s.</li>
	<li>As noted above, this package will continue to run a simulation
	entirely on the GPU(s) (except for inter-processor MPI communication),
	for multiple timesteps, until a CPU calculation is required, either by
	a fix or compute that is non-GPU-ized, or until output is performed
	(thermo or dump snapshot or restart file). The less often this
	occurs, the faster your simulation will run.</li>
	</ul>
	<div class="section" id="restrictions">
	<h2>Restrictions</h2>
	<p>None.</p>
	</div>
	</div>


	</div>
	</div>
	<footer>


	<hr/>

	<div role="contentinfo">
	<p>
	© Copyright 2013 Sandia Corporation.
	</p>
	</div>
	Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.

	</footer>

	</div>
	</div>

	</section>

	</div>





	<script type="text/javascript">
	var DOCUMENTATION_OPTIONS = {
	URL_ROOT:'./',
	VERSION:'',
	COLLAPSE_INDEX:false,
	FILE_SUFFIX:'.html',
	HAS_SOURCE: true
	};
	</script>
	<script type="text/javascript" src="_static/jquery.js"></script>
	<script type="text/javascript" src="_static/underscore.js"></script>
	<script type="text/javascript" src="_static/doctools.js"></script>
	<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/jquery-1.11.0.min.js"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/lightbox.min.js"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2-customize/jquery-noconflict.js"></script>





	<script type="text/javascript" src="_static/js/theme.js"></script>




	<script type="text/javascript">
	jQuery(function () {
	SphinxRtdTheme.StickyNav.enable();
	});
	</script>


	</body>
	</html>

accelerate_cuda.htmlNo OneTemporaryActions

File Metadata

accelerate_cuda.htmlView Options

Event Timeline

accelerate_cuda.html
No OneTemporary
Actions

accelerate_cuda.html
View Options