accelerate_intel.html
No OneTemporary
Actions

Subscribers

None

File Metadata

Created: Mon, Dec 23, 17:30

accelerate_intel.html
View Options



	<!DOCTYPE html>
	<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
	<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
	<head>
	<meta charset="utf-8">

	<meta name="viewport" content="width=device-width, initial-scale=1.0">

	<title>5.USER-INTEL package — LAMMPS documentation</title>















	<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />



	<link rel="stylesheet" href="_static/sphinxcontrib-images/LightBox2/lightbox2/css/lightbox.css" type="text/css" />



	<link rel="top" title="LAMMPS documentation" href="index.html"/>


	<script src="_static/js/modernizr.min.js"></script>

	</head>

	<body class="wy-body-for-nav" role="document">

	<div class="wy-grid-for-nav">


	<nav data-toggle="wy-nav-shift" class="wy-nav-side">
	<div class="wy-side-nav-search">



	<a href="Manual.html" class="icon icon-home"> LAMMPS



	</a>


	<div role="search">
	<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
	<input type="text" name="q" placeholder="Search docs" />
	<input type="hidden" name="check_keywords" value="yes" />
	<input type="hidden" name="area" value="default" />
	</form>
	</div>


	</div>

	<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">



	<ul>
	<li class="toctree-l1"><a class="reference internal" href="Section_intro.html">1. Introduction</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_start.html">2. Getting Started</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_commands.html">3. Commands</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_packages.html">4. Packages</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_accelerate.html">5. Accelerating LAMMPS performance</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_howto.html">6. How-to discussions</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_example.html">7. Example problems</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_perf.html">8. Performance & scalability</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_tools.html">9. Additional tools</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_modify.html">10. Modifying & extending LAMMPS</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_python.html">11. Python interface to LAMMPS</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_errors.html">12. Errors</a></li>
	<li class="toctree-l1"><a class="reference internal" href="Section_history.html">13. Future and history</a></li>
	</ul>



	</div>

	</nav>

	<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">


	<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
	<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
	<a href="Manual.html">LAMMPS</a>
	</nav>



	<div class="wy-nav-content">
	<div class="rst-content">
	<div role="navigation" aria-label="breadcrumbs navigation">
	<ul class="wy-breadcrumbs">
	<li><a href="Manual.html">Docs</a> »</li>

	<li>5.USER-INTEL package</li>
	<li class="wy-breadcrumbs-aside">


	<a href="http://lammps.sandia.gov">Website</a>
	<a href="Section_commands.html#comm">Commands</a>

	</li>
	</ul>
	<hr/>

	</div>
	<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
	<div itemprop="articleBody">

	<p><a class="reference internal" href="Section_accelerate.html"><span class="doc">Return to Section accelerate overview</span></a></p>
	<div class="section" id="user-intel-package">
	<h1>5.USER-INTEL package</h1>
	<p>The USER-INTEL package was developed by Mike Brown at Intel
	Corporation. It provides two methods for accelerating simulations,
	depending on the hardware you have. The first is acceleration on
	Intel(R) CPUs by running in single, mixed, or double precision with
	vectorization. The second is acceleration on Intel(R) Xeon Phi(TM)
	coprocessors via offloading neighbor list and non-bonded force
	calculations to the Phi. The same C++ code is used in both cases.
	When offloading to a coprocessor from a CPU, the same routine is run
	twice, once on the CPU and once with an offload flag.</p>
	<p>Note that the USER-INTEL package supports use of the Phi in “offload”
	mode, not “native” mode like the <a class="reference internal" href="accelerate_kokkos.html"><span class="doc">KOKKOS package</span></a>.</p>
	<p>Also note that the USER-INTEL package can be used in tandem with the
	<a class="reference internal" href="accelerate_omp.html"><span class="doc">USER-OMP package</span></a>. This is useful when
	offloading pair style computations to the Phi, so that other styles
	not supported by the USER-INTEL package, e.g. bond, angle, dihedral,
	improper, and long-range electrostatics, can run simultaneously in
	threaded mode on the CPU cores. Since less MPI tasks than CPU cores
	will typically be invoked when running with coprocessors, this enables
	the extra CPU cores to be used for useful computation.</p>
	<p>As illustrated below, if LAMMPS is built with both the USER-INTEL and
	USER-OMP packages, this dual mode of operation is made easier to use,
	via the “-suffix hybrid intel omp” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> or the <a class="reference internal" href="suffix.html"><span class="doc">suffix hybrid intel omp</span></a> command. Both set a second-choice suffix to “omp” so
	that styles from the USER-INTEL package will be used if available,
	with styles from the USER-OMP package as a second choice.</p>
	<p>Here is a quick overview of how to use the USER-INTEL package for CPU
	acceleration, assuming one or more 16-core nodes. More details
	follow.</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">use</span> <span class="n">an</span> <span class="n">Intel</span> <span class="n">compiler</span>
	<span class="n">use</span> <span class="n">these</span> <span class="n">CCFLAGS</span> <span class="n">settings</span> <span class="ow">in</span> <span class="n">Makefile</span><span class="o">.</span><span class="n">machine</span><span class="p">:</span> <span class="o">-</span><span class="n">fopenmp</span><span class="p">,</span> <span class="o">-</span><span class="n">DLAMMPS_MEMALIGN</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="o">-</span><span class="n">restrict</span><span class="p">,</span> <span class="o">-</span><span class="n">xHost</span><span class="p">,</span> <span class="o">-</span><span class="n">fno</span><span class="o">-</span><span class="n">alias</span><span class="p">,</span> <span class="o">-</span><span class="n">ansi</span><span class="o">-</span><span class="n">alias</span><span class="p">,</span> <span class="o">-</span><span class="n">override</span><span class="o">-</span><span class="n">limits</span>
	<span class="n">use</span> <span class="n">these</span> <span class="n">LINKFLAGS</span> <span class="n">settings</span> <span class="ow">in</span> <span class="n">Makefile</span><span class="o">.</span><span class="n">machine</span><span class="p">:</span> <span class="o">-</span><span class="n">fopenmp</span><span class="p">,</span> <span class="o">-</span><span class="n">xHost</span>
	<span class="n">make</span> <span class="n">yes</span><span class="o">-</span><span class="n">user</span><span class="o">-</span><span class="n">intel</span> <span class="n">yes</span><span class="o">-</span><span class="n">user</span><span class="o">-</span><span class="n">omp</span> <span class="c1"># including user-omp is optional</span>
	<span class="n">make</span> <span class="n">mpi</span> <span class="c1"># build with the USER-INTEL package, if settings (including compiler) added to Makefile.mpi</span>
	<span class="n">make</span> <span class="n">intel_cpu</span> <span class="c1"># or Makefile.intel_cpu already has settings, uses Intel MPI wrapper</span>
	<span class="n">Make</span><span class="o">.</span><span class="n">py</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">p</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">intel</span> <span class="n">cpu</span> <span class="o">-</span><span class="n">a</span> <span class="n">file</span> <span class="n">mpich_icc</span> <span class="c1"># or one-line build via Make.py for MPICH</span>
	<span class="n">Make</span><span class="o">.</span><span class="n">py</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">p</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">intel</span> <span class="n">cpu</span> <span class="o">-</span><span class="n">a</span> <span class="n">file</span> <span class="n">ompi_icc</span> <span class="c1"># or for OpenMPI</span>
	<span class="n">Make</span><span class="o">.</span><span class="n">py</span> <span class="o">-</span><span class="n">v</span> <span class="o">-</span><span class="n">p</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">intel</span> <span class="n">cpu</span> <span class="o">-</span><span class="n">a</span> <span class="n">file</span> <span class="n">intel_cpu</span> <span class="c1"># or for Intel MPI wrapper</span>
	</pre></div>
	</div>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">intel</span> <span class="o">-</span><span class="n">pk</span> <span class="n">intel</span> <span class="mi">0</span> <span class="n">omp</span> <span class="mi">16</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 1 node, 1 MPI task/node, 16 threads/task, no USER-OMP</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">32</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">intel</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 2 nodess, 16 MPI tasks/node, no threads, no USER-OMP</span>
	<span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">hybrid</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">pk</span> <span class="n">intel</span> <span class="mi">0</span> <span class="n">omp</span> <span class="mi">16</span> <span class="o">-</span><span class="n">pk</span> <span class="n">omp</span> <span class="mi">16</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 1 node, 1 MPI task/node, 16 threads/task, with USER-OMP</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">32</span> <span class="o">-</span><span class="n">ppn</span> <span class="mi">4</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">hybrid</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">pk</span> <span class="n">omp</span> <span class="mi">4</span> <span class="o">-</span><span class="n">pk</span> <span class="n">omp</span> <span class="mi">4</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 8 nodes, 4 MPI tasks/node, 4 threads/task, with USER-OMP</span>
	</pre></div>
	</div>
	<p>Here is a quick overview of how to use the USER-INTEL package for the
	same CPUs as above (16 cores/node), with an additional Xeon Phi(TM)
	coprocessor per node. More details follow.</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">Same</span> <span class="k">as</span> <span class="n">above</span> <span class="k">for</span> <span class="n">building</span><span class="p">,</span> <span class="k">with</span> <span class="n">these</span> <span class="n">additions</span><span class="o">/</span><span class="n">changes</span><span class="p">:</span>
	<span class="n">add</span> <span class="n">the</span> <span class="n">flag</span> <span class="o">-</span><span class="n">DLMP_INTEL_OFFLOAD</span> <span class="n">to</span> <span class="n">CCFLAGS</span> <span class="ow">in</span> <span class="n">Makefile</span><span class="o">.</span><span class="n">machine</span>
	<span class="n">add</span> <span class="n">the</span> <span class="n">flag</span> <span class="o">-</span><span class="n">offload</span> <span class="n">to</span> <span class="n">LINKFLAGS</span> <span class="ow">in</span> <span class="n">Makefile</span><span class="o">.</span><span class="n">machine</span>
	<span class="k">for</span> <span class="n">Make</span><span class="o">.</span><span class="n">py</span> <span class="n">change</span> <span class="s2">"-intel cpu"</span> <span class="n">to</span> <span class="s2">"-intel phi"</span><span class="p">,</span> <span class="ow">and</span> <span class="s2">"file intel_cpu"</span> <span class="n">to</span> <span class="s2">"file intel_phi"</span>
	</pre></div>
	</div>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">32</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">intel</span> <span class="o">-</span><span class="n">pk</span> <span class="n">intel</span> <span class="mi">1</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 2 nodes, 16 MPI tasks/node, 240 total threads on coprocessor, no USER-OMP</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">16</span> <span class="o">-</span><span class="n">ppn</span> <span class="mi">8</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">intel</span> <span class="o">-</span><span class="n">pk</span> <span class="n">intel</span> <span class="mi">1</span> <span class="n">omp</span> <span class="mi">2</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 2 nodes, 8 MPI tasks/node, 2 threads/task, 240 total threads on coprocessor, no USER-OMP</span>
	<span class="n">mpirun</span> <span class="o">-</span><span class="n">np</span> <span class="mi">32</span> <span class="o">-</span><span class="n">ppn</span> <span class="mi">8</span> <span class="n">lmp_machine</span> <span class="o">-</span><span class="n">sf</span> <span class="n">hybrid</span> <span class="n">intel</span> <span class="n">omp</span> <span class="o">-</span><span class="n">pk</span> <span class="n">intel</span> <span class="mi">1</span> <span class="n">omp</span> <span class="mi">2</span> <span class="o">-</span><span class="n">pk</span> <span class="n">omp</span> <span class="mi">2</span> <span class="o">-</span><span class="ow">in</span> <span class="ow">in</span><span class="o">.</span><span class="n">script</span> <span class="c1"># 4 nodes, 8 MPI tasks/node, 2 threads/task, 240 total threads on coprocessor, with USER-OMP</span>
	</pre></div>
	</div>
	<p><strong>Required hardware/software:</strong></p>
	<p>Your compiler must support the OpenMP interface. Use of an Intel(R)
	C++ compiler is recommended, but not required. However, g++ will not
	recognize some of the settings listed above, so they cannot be used.
	Optimizations for vectorization have only been tested with the
	Intel(R) compiler. Use of other compilers may not result in
	vectorization, or give poor performance.</p>
	<p>The recommended version of the Intel(R) compiler is 14.0.1.106.
	Versions 15.0.1.133 and later are also supported. If using Intel(R)
	MPI, versions 15.0.2.044 and later are recommended.</p>
	<p>To use the offload option, you must have one or more Intel(R) Xeon
	Phi(TM) coprocessors and use an Intel(R) C++ compiler.</p>
	<p><strong>Building LAMMPS with the USER-INTEL package:</strong></p>
	<p>The lines above illustrate how to include/build with the USER-INTEL
	package, for either CPU or Phi support, in two steps, using the “make”
	command. Or how to do it with one command via the src/Make.py script,
	described in <a class="reference internal" href="Section_start.html#start-4"><span class="std std-ref">Section 2.4</span></a> of the manual.
	Type “Make.py -h” for help. Because the mechanism for specifing what
	compiler to use (Intel in this case) is different for different MPI
	wrappers, 3 versions of the Make.py command are shown.</p>
	<p>Note that if you build with support for a Phi coprocessor, the same
	binary can be used on nodes with or without coprocessors installed.
	However, if you do not have coprocessors on your system, building
	without offload support will produce a smaller binary.</p>
	<p>If you also build with the USER-OMP package, you can use styles from
	both packages, as described below.</p>
	<p>Note that the CCFLAGS and LINKFLAGS settings in Makefile.machine must
	include “-fopenmp”. Likewise, if you use an Intel compiler, the
	CCFLAGS setting must include “-restrict”. For Phi support, the
	“-DLMP_INTEL_OFFLOAD” (CCFLAGS) and “-offload” (LINKFLAGS) settings
	are required. The other settings listed above are optional, but will
	typically improve performance. The Make.py command will add all of
	these automatically.</p>
	<p>If you are compiling on the same architecture that will be used for
	the runs, adding the flag <em>-xHost</em> to CCFLAGS enables vectorization
	with the Intel(R) compiler. Otherwise, you must provide the correct
	compute node architecture to the -x option (e.g. -xAVX).</p>
	<p>Example machines makefiles Makefile.intel_cpu and Makefile.intel_phi
	are included in the src/MAKE/OPTIONS directory with settings that
	perform well with the Intel(R) compiler. The latter has support for
	offload to Phi coprocessors; the former does not.</p>
	<p><strong>Run with the USER-INTEL package from the command line:</strong></p>
	<p>The mpirun or mpiexec command sets the total number of MPI tasks used
	by LAMMPS (one or multiple per compute node) and the number of MPI
	tasks used per node. E.g. the mpirun command in MPICH does this via
	its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.</p>
	<p>If you compute (any portion of) pairwise interactions using USER-INTEL
	pair styles on the CPU, or use USER-OMP styles on the CPU, you need to
	choose how many OpenMP threads per MPI task to use. If both packages
	are used, it must be done for both packages, and the same thread count
	value should be used for both. Note that the product of MPI tasks *
	threads/task should not exceed the physical number of cores (on a
	node), otherwise performance will suffer.</p>
	<p>When using the USER-INTEL package for the Phi, you also need to
	specify the number of coprocessor/node and optionally the number of
	coprocessor threads per MPI task to use. Note that coprocessor
	threads (which run on the coprocessor) are totally independent from
	OpenMP threads (which run on the CPU). The default values for the
	settings that affect coprocessor threads are typically fine, as
	discussed below.</p>
	<p>As in the lines above, use the “-sf intel” or “-sf hybrid intel omp”
	<a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a>, which will
	automatically append “intel” to styles that support it. In the second
	case, “omp” will be appended if an “intel” style does not exist.</p>
	<p>Note that if either switch is used, it also invokes a default command:
	<a class="reference internal" href="package.html"><span class="doc">package intel 1</span></a>. If the “-sf hybrid intel omp” switch
	is used, the default USER-OMP command <a class="reference internal" href="package.html"><span class="doc">package omp 0</span></a> is
	also invoked (if LAMMPS was built with USER-OMP). Both set the number
	of OpenMP threads per MPI task via the OMP_NUM_THREADS environment
	variable. The first command sets the number of Xeon Phi(TM)
	coprocessors/node to 1 (ignored if USER-INTEL is built for CPU-only),
	and the precision mode to “mixed” (default value).</p>
	<p>You can also use the “-pk intel Nphi” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> to explicitly set Nphi = # of Xeon
	Phi(TM) coprocessors/node, as well as additional options. Nphi should
	be >= 1 if LAMMPS was built with coprocessor support, otherswise Nphi
	= 0 for a CPU-only build. All the available coprocessor threads on
	each Phi will be divided among MPI tasks, unless the <em>tptask</em> option
	of the “-pk intel” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> is
	used to limit the coprocessor threads per MPI task. See the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command for details, including the default values
	used for all its options if not specified, and how to set the number
	of OpenMP threads via the OMP_NUM_THREADS environment variable if
	desired.</p>
	<p>If LAMMPS was built with the USER-OMP package, you can also use the
	“-pk omp Nt” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switch</span></a> to
	explicitly set Nt = # of OpenMP threads per MPI task to use, as well
	as additional options. Nt should be the same threads per MPI task as
	set for the USER-INTEL package, e.g. via the “-pk intel Nphi omp Nt”
	command. Again, see the <a class="reference internal" href="package.html"><span class="doc">package omp</span></a> command for
	details, including the default values used for all its options if not
	specified, and how to set the number of OpenMP threads via the
	OMP_NUM_THREADS environment variable if desired.</p>
	<p><strong>Or run with the USER-INTEL package by editing an input script:</strong></p>
	<p>The discussion above for the mpirun/mpiexec command, MPI tasks/node,
	OpenMP threads per MPI task, and coprocessor threads per MPI task is
	the same.</p>
	<p>Use the <a class="reference internal" href="suffix.html"><span class="doc">suffix intel</span></a> or <a class="reference internal" href="suffix.html"><span class="doc">suffix hybrid intel omp</span></a> commands, or you can explicitly add an “intel” or
	“omp” suffix to individual styles in your input script, e.g.</p>
	<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">pair_style</span> <span class="n">lj</span><span class="o">/</span><span class="n">cut</span><span class="o">/</span><span class="n">intel</span> <span class="mf">2.5</span>
	</pre></div>
	</div>
	<p>You must also use the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command, unless the
	“-sf intel” or “-pk intel” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switches</span></a> were used. It specifies how many
	coprocessors/node to use, as well as other OpenMP threading and
	coprocessor options. The <a class="reference internal" href="package.html"><span class="doc">package</span></a> doc page explains how
	to set the number of OpenMP threads via an environment variable if
	desired.</p>
	<p>If LAMMPS was also built with the USER-OMP package, you must also use
	the <a class="reference internal" href="package.html"><span class="doc">package omp</span></a> command to enable that package, unless
	the “-sf hybrid intel omp” or “-pk omp” <a class="reference internal" href="Section_start.html#start-7"><span class="std std-ref">command-line switches</span></a> were used. It specifies how many
	OpenMP threads per MPI task to use (should be same as the setting for
	the USER-INTEL package), as well as other options. Its doc page
	explains how to set the number of OpenMP threads via an environment
	variable if desired.</p>
	<p><strong>Speed-ups to expect:</strong></p>
	<p>If LAMMPS was not built with coprocessor support (CPU only) when
	including the USER-INTEL package, then acclerated styles will run on
	the CPU using vectorization optimizations and the specified precision.
	This may give a substantial speed-up for a pair style, particularly if
	mixed or single precision is used.</p>
	<p>If LAMMPS was built with coproccesor support, the pair styles will run
	on one or more Intel(R) Xeon Phi(TM) coprocessors (per node). The
	performance of a Xeon Phi versus a multi-core CPU is a function of
	your hardware, which pair style is used, the number of
	atoms/coprocessor, and the precision used on the coprocessor (double,
	single, mixed).</p>
	<p>See the <a class="reference external" href="http://lammps.sandia.gov/bench.html">Benchmark page</a> of the
	LAMMPS web site for performance of the USER-INTEL package on different
	hardware.</p>
	<div class="admonition note">
	<p class="first admonition-title">Note</p>
	<p class="last">Setting core affinity is often used to pin MPI tasks and OpenMP
	threads to a core or group of cores so that memory access can be
	uniform. Unless disabled at build time, affinity for MPI tasks and
	OpenMP threads on the host (CPU) will be set by default on the host
	when using offload to a coprocessor. In this case, it is unnecessary
	to use other methods to control affinity (e.g. taskset, numactl,
	I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script with
	the <em>no_affinity</em> option to the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command
	or by disabling the option at build time (by adding
	-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
	Disabling this option is not recommended, especially when running on a
	machine with hyperthreading disabled.</p>
	</div>
	<p><strong>Guidelines for best performance on an Intel(R) Xeon Phi(TM)
	coprocessor:</strong></p>
	<ul class="simple">
	<li>The default for the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command is to have
	all the MPI tasks on a given compute node use a single Xeon Phi(TM)
	coprocessor. In general, running with a large number of MPI tasks on
	each node will perform best with offload. Each MPI task will
	automatically get affinity to a subset of the hardware threads
	available on the coprocessor. For example, if your card has 61 cores,
	with 60 cores available for offload and 4 hardware threads per core
	(240 total threads), running with 24 MPI tasks per node will cause
	each MPI task to use a subset of 10 threads on the coprocessor. Fine
	tuning of the number of threads to use per MPI task or the number of
	threads to use per core can be accomplished with keyword settings of
	the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command.</li>
	<li>If desired, only a fraction of the pair style computation can be
	offloaded to the coprocessors. This is accomplished by using the
	<em>balance</em> keyword in the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command. A
	balance of 0 runs all calculations on the CPU. A balance of 1 runs
	all calculations on the coprocessor. A balance of 0.5 runs half of
	the calculations on the coprocessor. Setting the balance to -1 (the
	default) will enable dynamic load balancing that continously adjusts
	the fraction of offloaded work throughout the simulation. This option
	typically produces results within 5 to 10 percent of the optimal fixed
	balance.</li>
	<li>When using offload with CPU hyperthreading disabled, it may help
	performance to use fewer MPI tasks and OpenMP threads than available
	cores. This is due to the fact that additional threads are generated
	internally to handle the asynchronous offload tasks.</li>
	<li>If running short benchmark runs with dynamic load balancing, adding a
	short warm-up run (10-20 steps) will allow the load-balancer to find a
	near-optimal setting that will carry over to additional runs.</li>
	<li>If pair computations are being offloaded to an Intel(R) Xeon Phi(TM)
	coprocessor, a diagnostic line is printed to the screen (not to the
	log file), during the setup phase of a run, indicating that offload
	mode is being used and indicating the number of coprocessor threads
	per MPI task. Additionally, an offload timing summary is printed at
	the end of each run. When offloading, the frequency for <a class="reference internal" href="atom_modify.html"><span class="doc">atom sorting</span></a> is changed to 1 so that the per-atom data is
	effectively sorted at every rebuild of the neighbor lists.</li>
	<li>For simulations with long-range electrostatics or bond, angle,
	dihedral, improper calculations, computation and data transfer to the
	coprocessor will run concurrently with computations and MPI
	communications for these calculations on the host CPU. The USER-INTEL
	package has two modes for deciding which atoms will be handled by the
	coprocessor. This choice is controlled with the <em>ghost</em> keyword of
	the <a class="reference internal" href="package.html"><span class="doc">package intel</span></a> command. When set to 0, ghost atoms
	(atoms at the borders between MPI tasks) are not offloaded to the
	card. This allows for overlap of MPI communication of forces with
	computation on the coprocessor when the <a class="reference internal" href="newton.html"><span class="doc">newton</span></a> setting
	is “on”. The default is dependent on the style being used, however,
	better performance may be achieved by setting this option
	explictly.</li>
	</ul>
	<div class="section" id="restrictions">
	<h2>Restrictions</h2>
	<p>When offloading to a coprocessor, <a class="reference internal" href="pair_hybrid.html"><span class="doc">hybrid</span></a> styles
	that require skip lists for neighbor builds cannot be offloaded.
	Using <a class="reference internal" href="pair_hybrid.html"><span class="doc">hybrid/overlay</span></a> is allowed. Only one intel
	accelerated style may be used with hybrid styles.
	<a class="reference internal" href="special_bonds.html"><span class="doc">Special_bonds</span></a> exclusion lists are not currently
	supported with offload, however, the same effect can often be
	accomplished by setting cutoffs for excluded atom types to 0. None of
	the pair styles in the USER-INTEL package currently support the
	“inner”, “middle”, “outer” options for rRESPA integration via the
	<a class="reference internal" href="run_style.html"><span class="doc">run_style respa</span></a> command; only the “pair” option is
	supported.</p>
	</div>
	</div>


	</div>
	</div>
	<footer>


	<hr/>

	<div role="contentinfo">
	<p>
	© Copyright 2013 Sandia Corporation.
	</p>
	</div>
	Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.

	</footer>

	</div>
	</div>

	</section>

	</div>





	<script type="text/javascript">
	var DOCUMENTATION_OPTIONS = {
	URL_ROOT:'./',
	VERSION:'',
	COLLAPSE_INDEX:false,
	FILE_SUFFIX:'.html',
	HAS_SOURCE: true
	};
	</script>
	<script type="text/javascript" src="_static/jquery.js"></script>
	<script type="text/javascript" src="_static/underscore.js"></script>
	<script type="text/javascript" src="_static/doctools.js"></script>
	<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/jquery-1.11.0.min.js"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/lightbox.min.js"></script>
	<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2-customize/jquery-noconflict.js"></script>





	<script type="text/javascript" src="_static/js/theme.js"></script>




	<script type="text/javascript">
	jQuery(function () {
	SphinxRtdTheme.StickyNav.enable();
	});
	</script>


	</body>
	</html>

accelerate_intel.htmlNo OneTemporaryActions

File Metadata

accelerate_intel.htmlView Options

Event Timeline

accelerate_intel.html
No OneTemporary
Actions

accelerate_intel.html
View Options